WO2020216089A1 - Voice control system and method, and voice suite and voice apparatus - Google Patents

Voice control system and method, and voice suite and voice apparatus Download PDF

Info

Publication number
WO2020216089A1
WO2020216089A1 PCT/CN2020/084427 CN2020084427W WO2020216089A1 WO 2020216089 A1 WO2020216089 A1 WO 2020216089A1 CN 2020084427 W CN2020084427 W CN 2020084427W WO 2020216089 A1 WO2020216089 A1 WO 2020216089A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
processing device
server
voice input
communication module
Prior art date
Application number
PCT/CN2020/084427
Other languages
French (fr)
Chinese (zh)
Inventor
孙大鹏
贾伟
赵敏
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020216089A1 publication Critical patent/WO2020216089A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
    • GPHYSICS
    • G08SIGNALLING
    • G08CTRANSMISSION SYSTEMS FOR MEASURED VALUES, CONTROL OR SIMILAR SIGNALS
    • G08C23/00Non-electrical signal transmission systems, e.g. optical systems
    • G08C23/04Non-electrical signal transmission systems, e.g. optical systems using light waves, e.g. infrared
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0209Power saving arrangements in terminal devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L2012/284Home automation networks characterised by the type of medium used
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of information technology, and in particular to a voice control system and method, as well as a voice kit and a voice device.
  • the voice control can be realized by a smart speaker as a control node in the home or an electrical appliance itself with a voice interaction function.
  • Devices with voice interaction functions are usually equipped with specialized voice processing modules. These voice processing modules can only be arranged inside the device. On the one hand, the cost of mold opening and separate debugging is high. On the other hand, due to the constraints of the equipment assembly, they cannot be placed flexibly and have poor adaptability.
  • this application proposes a separate voice device as an entrance to intelligent voice.
  • the voice device collects voice input and sends it to the processing device, which implements semantic analysis and corresponding command issuance locally or in the cloud, thus facilitating various types of equipment, especially various smart home equipment and even traditional equipment. Flexible control.
  • a voice control system including a plurality of voice kits and a server communicating with the voice kit, wherein the voice kit includes: a voice device for collecting voice input and collecting The received voice input is sent to the processing device; the processing device is used to receive the voice input collected by the voice device and send the voice input to the server, and the server is used to send to the processing device Perform semantic recognition on the voice input to generate and issue operation commands for the operation of the target device corresponding to the recognition semantics.
  • the voice kit may include a plurality of the voice devices arranged in different areas, and each processing device is communicatively connected with one or more of the voice devices within a communication range.
  • the voice device and the processing device perform short-distance communication, for example, each includes a low-power short-distance communication module for local short-distance communication with each other, and the communication module includes at least one of the following: A Bluetooth communication module that communicates with the processing device using Bluetooth technology. An infrared communication module that communicates with the processing device based on infrared technology; and a Zigbee communication module that communicates with the processing device based on Zigbee technology.
  • the server can be implemented as a remote server, so the processing device can include a WiFi module for remote communication with the server.
  • the voice device recognizes a wake-up word from the collected voice input, and sends the collected voice input to the processing device after the wake-up module recognizes the wake-up word.
  • the voice device performs voice output on the content received from the processing device and/or the target device via its own speaker or a wired/wireless external speaker.
  • the content of the voice output may include at least one of the following: the statement content of the execution command; and the interaction content with the user, for example, further obtaining the missing semantic elements.
  • the voice device is also used to perform infrared output on the command received by the processing device, so as to realize the operation of the target device corresponding to the command.
  • the voice device further includes an attachment mechanism for attaching to the wall, the processing device, the target device, or the surface of other facilities.
  • the processing device is configured to perform semantic recognition for at least part of the voice input, and generate operation commands for recognizing the operation of the target device corresponding to the semantics.
  • the target device may directly receive the issued operation command from the server and execute the operation corresponding to the operation command; and/or the processing device may receive the download from the server
  • the issued operation command is issued to the target device by itself or via the voice device.
  • the target device that directly receives and executes the issued operation command from the server may include a networked smart home appliance; and the target device that obtains the operation command via the processing device or the voice device includes Traditional home appliances.
  • the server may include: a semantic processing server for semantic recognition of uploaded voice input; a command generation server for generating operation commands for target recognition operations based on the recognized semantics; and issuing The command of the operation command is issued to the server.
  • the command generation and issuance for some target devices can be completed by an external server.
  • the server may include a semantic processing server for semantic recognition of the uploaded voice input, and the server sends the recognized semantics to an external server, wherein the external server is based on The recognized semantics generates a command generation server for the operation command for the target recognition operation, and the command issuance server for issuing the operation command.
  • the server may obtain in advance at least one piece of local device configuration information: the distribution and device information of the voice device, the processing device, and/or at least part of the target device; the voice device, the processing device The correspondence between at least two of the device and the target device.
  • the server automatically fills in semantic elements that are missing in the recognition semantics for performing operations on the target device based on the local device configuration information.
  • the server may include: a local server that communicates with the voice kit in close range, and the local server may be used to perform semantic recognition for at least part of the voice input, and generate operations for recognizing the operation of the target device corresponding to the semantics Command and issue the operation command.
  • a voice kit including: a voice device for collecting voice input and sending the collected voice input to a processing device via local communication; and a processing device, including a voice device A communication unit connected in communication, the processing device receives voice data collected by the voice device via the communication unit, so as to realize the semantic recognition of the voice input and the target device operation corresponding to the recognition semantics.
  • the voice device and the processing device may each include a low-power short-range communication module for performing short-range communication with each other, and the communication module includes at least one of the following: communicating with the processing device based on Bluetooth technology Bluetooth communication module; infrared communication module based on infrared technology to communicate with the processing device; and Zigbee communication module based on Zigbee technology to communicate with the processing device.
  • the communication module includes at least one of the following: communicating with the processing device based on Bluetooth technology Bluetooth communication module; infrared communication module based on infrared technology to communicate with the processing device; and Zigbee communication module based on Zigbee technology to communicate with the processing device.
  • the processing device may include: a networking unit for uploading voice data from the user received from the voice device to a server, wherein the server performs semantic recognition for the voice input to generate and Issue operation commands that identify the operation of the target device corresponding to the semantics. Therefore, by changing the networked device to the processing device, the cloud can be used to realize intelligent AI voice processing while ensuring the miniaturization of the voice stickers.
  • the target device can directly receive the issued operation command from the server and execute the operation corresponding to the operation command; and/or the processing device receives the issued operation command from the server via the networking unit
  • the operation command is issued to the target device by itself or via the voice device.
  • different execution methods of cloud-issued commands can be implemented according to different application scenarios: direct control of home appliances, AI module control, voice paste control, etc.
  • the target device that directly receives the issued operation command from the server and executes it may include its own networked smart home appliance; and the target device that obtains the operation command via the processing device or the voice device includes a traditional home appliance. .
  • the networking unit may be further used for: receiving the interactive content with the user issued by the server, and the communication unit is also used for: sending the interactive content with the user to the voice device for Voice output.
  • the voice entry function of voice stickers is further improved.
  • the voice devices and processing devices in the kit can be freely combined.
  • the kit may include multiple voice devices arranged in different areas, and each processing device is connected to one or more devices in the communication range.
  • the communication connection of miniaturized equipment can be freely combined.
  • the processing device may further include: a voice recognition unit for performing semantic recognition for the voice input; and an operation command generating unit for generating operation commands for recognizing the operation of the target device corresponding to the semantics. This enables fast local processing of simple commands.
  • a voice device which includes: a microphone for collecting voice input; a communication module for sending the collected voice input to a processing device, so as to realize the semantics of the voice input Identify and identify the target device operations corresponding to the semantics.
  • the device may include an attachment mechanism for attaching to a wall, a processing device, a target device, or the surface of other facilities, for example, in the form of a smart voice sticker.
  • the device may further include: a wake-up module for recognizing a wake-up word from a voice input from a user, and the communication module is used for collecting the wake-up word after the wake-up module recognizes the wake-up word
  • the voice input is sent to the processing device.
  • the wake-up module can be implemented by an existing low-power miniaturized DSP, thereby providing a far-field wake-up function while ensuring low power consumption and miniaturization.
  • the device may further include: a speaker for voice output of the content received by the communication module from the processing device.
  • the content of the voice output by the speaker includes at least one of the following: statement content for executing the command; and interactive content with the user, for example, further obtaining missing semantic elements.
  • the voice portal function of the above device is further completed.
  • the device may further include: an external speaker connection module, so that the connected external speaker performs voice output on the content received by the communication module from the processing device.
  • the external speaker connection module is at least one of the following: the communication module including a Bluetooth connection function to connect an external Bluetooth speaker; and an external speaker wired connection module including an audio jack.
  • the device may further include: an infrared module, which is used to perform infrared output on the command received by the communication module from the processing device, so as to realize the operation of the target device corresponding to the command.
  • an infrared module which is used to perform infrared output on the command received by the communication module from the processing device, so as to realize the operation of the target device corresponding to the command.
  • the device can be multiplexed as a universal infrared remote control for traditional home appliances.
  • the communication module is a low-power short-range communication module, such as a Bluetooth communication module that communicates with the processing device based on Bluetooth technology, or an infrared communication module that communicates with the processing device based on infrared technology, or It is a Zigbee communication module based on Zigbee technology to communicate with the processing device. This provides the possibility for miniaturization and low-power communication.
  • a Bluetooth communication module that communicates with the processing device based on Bluetooth technology
  • an infrared communication module that communicates with the processing device based on infrared technology
  • Zigbee communication module based on Zigbee technology to communicate with the processing device. This provides the possibility for miniaturization and low-power communication.
  • the device may further include: a power supply module, the power supply module includes at least one of the following: a wireless charging assembly; a battery assembly; and a USB socket.
  • a power supply module includes at least one of the following: a wireless charging assembly; a battery assembly; and a USB socket.
  • the low power consumption of the device can eliminate the need for a power cord, thereby further improving its portability.
  • a voice control method which can be implemented by the above voice device, kit and system, and includes: the voice device collects voice input; the voice device sends the voice input to the processing device; The processing device realizes semantic recognition for the voice input and generation of corresponding target device operation commands.
  • the voice device sending the voice input to the processing device includes: the voice device sending the voice input to the processing device via short-range communication.
  • the processing device that implements semantic recognition for the voice input and the generation of the corresponding target device operation command includes: the processing device uploads the voice input to the server; the server performs the semantics of the voice input Recognition to obtain the operation command of the target device operation corresponding to the recognition semantics.
  • the processing device uploading the voice input to the server includes: the processing device uploads the voice input to the local server via short-range communication.
  • the semantic recognition of the voice input by the server to obtain the operation command of the target device operation corresponding to the recognition semantics includes: the local server performs semantic recognition for at least part of the voice input to generate recognition semantics
  • the operation command corresponding to the operation of the target device is issued and the operation command is issued.
  • the processing device that implements semantic recognition for the voice input and the generation of corresponding target device operation commands includes: the processing device performs semantic recognition for at least part of the voice input, and generates the target device operation corresponding to the recognition semantics The operation command.
  • the method may further include: using the voice device and/or the processing device to obtain an operation command for identifying the operation of the target device corresponding to the semantics; and the voice device and/or the processing device send the voice device and/or the processing device to the target The device issues the operation command.
  • the method may further include: the processing device obtains and delivers the voice output content; the processing device delivers the voice output content to the voice device; and the voice device outputs the voice output content by voice.
  • kits of the present application are convenient to use, and the miniaturized and portable design of the voice module makes it easy to be placed on various surfaces and suitable for use in multiple scenarios.
  • the smart voice sticker of the present application can also realize the function of a universal infrared remote control by adding an infrared module, thereby realizing intelligent control of traditional infrared devices.
  • Fig. 1 shows a schematic diagram of the composition of a speech device according to an embodiment of the present application
  • Figure 2 shows an example implemented as a smart voice sticker
  • Figure 3 shows an example of the composition of an intelligent voice sticker in this application
  • Figure 4 shows an example of the circuit processing flow of a voice device of the present application
  • FIG. 5 shows a schematic diagram of the composition of a voice kit according to an embodiment of the present application
  • Fig. 6 shows a schematic diagram of the composition of a voice control system according to an embodiment of the present application
  • Fig. 7 shows a schematic flowchart of a voice control method according to an embodiment of the present application
  • Figure 8 shows an example of voice control in this application.
  • the existing smart home voice interaction solutions for example, smart speakers and devices with voice interaction functions
  • problems such as low flexibility and limited use scenarios.
  • a separate voice device is proposed as the entrance of intelligent voice.
  • the voice device collects voice input and sends it to the processing device via local communication.
  • the latter implements semantic analysis and corresponding command issuance via local or cloud, thus facilitating various types of equipment, especially various smart home equipment and even Flexible control of traditional equipment.
  • Fig. 1 shows a schematic diagram of the composition of a speech device according to an embodiment of the present application.
  • the voice device 100 may include a microphone (MIC) 110 and a communication module 120.
  • MIC 110 is used to collect voice input.
  • the communication module 120 is used to send the collected voice input to the processing device, so as to realize the semantic recognition of the voice input and the target device operation corresponding to the semantic recognition based on the processing device.
  • the target device can be a smart device or a traditional home appliance.
  • the voice device can be combined with the processing device to form a voice kit (as shown in Figure 5 below).
  • kit refers to a group of devices that assist in achieving specific functions.
  • the voice device is used as an entry point for collecting voice signals from users in different areas due to its flexible and easy-to-arrange features, and the processing device summarizes the above-mentioned voice information collected by the voice device, and performs local and/or cloud semantic processing.
  • the function realizes the semantic recognition of the voice input and the operation of the target device corresponding to the recognition semantics.
  • the above voice kit can also be combined with the server to form a voice control system (as shown in Figure 6 below).
  • the server can be a local server that communicates with the voice kit via a short distance, or it can communicate remotely with processing devices in multiple voice kits to provide them with cloud semantic recognition, command generation, and remote services respectively. End (for example, server farm).
  • MIC is a transducer that converts sound into electrical signals.
  • the MIC 110 can collect the sound signal sent by the user, convert it into an electric signal containing the user's sound information, and send it to the communication module 120.
  • the communication module 120 may send an electrical signal containing user information as the user's voice input data to the processing device for subsequent semantic recognition of voice input and target device operations corresponding to the semantic recognition.
  • the MIC may also collect sound signals emitted by non-users, for example, sound signals emitted by other smart devices.
  • the voice device 100 can achieve low power consumption and miniaturization by retaining only the simplest semantic collection and communication functions, so as to facilitate its flexible installation.
  • the communication module 120 may be a short-range communication module with low power consumption.
  • “near-field communication” refers to short-distance wireless communication where the communication distance is usually within a few hundred meters.
  • the communication module 120 may be a Bluetooth (BT) communication module that communicates with the processing device based on Bluetooth technology, for example, a communication module based on a Bluetooth Mesh solution.
  • the communication module 120 may be an infrared (Infrared, IR) communication module that communicates with the processing device based on infrared technology, for example, a high-speed infrared transmission module.
  • BT Bluetooth
  • IR infrared
  • the communication module 120 may be a Zigbee communication module that communicates with the processing device based on Zigbee technology. In other embodiments, the communication module 120 may also use a combination of BT and IR. It should be understood that the communication module 120 of the present application can also be implemented by, for example, a new low-power short-distance communication technology developed in the future, so as to create conditions for its flexible layout through the miniaturization and low power consumption of the device itself.
  • the voice device 100 may also include a WiFi communication module that has relatively low power consumption and generally requires stronger processing functions, so as to communicate with the processing device using a local area network.
  • a WiFi communication module may also be used for short-distance communication in some embodiments.
  • an infrared module may be included as described above for short-distance low-power communication with the processing device. In other embodiments, it may further include an infrared module for applying operations to the target device, or multiplex the above-mentioned infrared communication module as a command application module. To this end, the infrared module may be used to perform infrared output on the command received by the communication module 120 from the processing device, so as to implement the operation of the target device corresponding to the command.
  • the above-mentioned operations via infrared output are particularly suitable for traditional target devices, such as televisions and air conditioners controlled by infrared remote controllers.
  • the infrared module used as an infrared remote control can generate and send an infrared signal corresponding to the receiving frequency band of the device based on the specific device targeted by the operation. Therefore, the voice device 100 can also be used as a universal infrared remote control. In other embodiments, the voice device can also implement control of the target device based on other technologies (for example, multiplexing Bluetooth communication devices).
  • the voice device 100 further includes a power supply module, including but not limited to: a wireless charging component; a battery component; and a USB socket. Since the power consumption of the voice collection and transmission function of the voice device 100 is extremely small, the power consumption of the voice device 100 is relatively small, and it is suitable for adopting a power supply structure without a power cord. This greatly improves the portability and flexibility of the voice device 100 itself.
  • a power supply module including but not limited to: a wireless charging component; a battery component; and a USB socket.
  • the voice device 100 of the present application may also include an attachment mechanism for attaching to a wall, a processing device, a target device, or other facility surfaces.
  • Figure 2 shows an example implemented as a smart voice sticker.
  • the voice device 100 of the present application may include, for example, a magnet or an electrostatic adsorption surface as the attachment mechanism 200, so as to be easily adsorbed on the surface of other facilities, such as the appliance (for example, a microwave oven or oven) in the figure. Side wall.
  • the voice device 100 of the present application also preferably includes a remote wake-up function.
  • remote wake-up refers to the way that a voice device can be awakened by a specific voice wake-up word.
  • a commercially available Tmall wizard can be awakened with the wake word "Tmall wizard”.
  • the voice device 100 may further include a wake-up module for recognizing the wake-up word from the voice input from the user.
  • the communication module 120 can correspondingly send the collected voice input to the processing device after the wake-up module recognizes the wake-up word.
  • the communication module 120 may receive and transmit the voice input of the user after the wake-up word is spoken.
  • the communication module 120 can receive the user's wake-up word itself and subsequent voice input and transmit it. Since the wake-up module can be realized by a limited miniaturization and low-power DSP (digital signal processing) circuit, the addition of the far-field wake-up function will not substantially affect the miniaturization and low power consumption characteristics of the voice device 100.
  • DSP digital signal processing
  • the voice device 100 of the present application may further include a speaker for voice output of the content received by the communication module from the processing device. Therefore, through the introduction of speakers, it is possible to conduct further voice interaction with users.
  • the content of the voice output by the speaker includes at least one of the following: the statement content of the execution command; and the interactive content, for example, the interactive content with the user, to further obtain the missing semantic elements.
  • the voice interaction will be described in detail in the following description with reference to the voice control system.
  • the speaker can be implemented with miniaturized integrated components.
  • the voice device 100 may also include an external speaker connection module, so that the connected external speaker performs voice output on the content received by the communication module from the processing device, thereby providing better quality than the built-in speaker. Audio output.
  • the external speaker connection module can be connected to the external speaker via a wired or wireless connection.
  • the external speaker connection module may be a Bluetooth communication module.
  • the Bluetooth communication module used to communicate with the processing device can also be reused as an external speaker connection module.
  • the external speaker connection module may be a wired connection module including an audio interface, for example, a 3.5mm jack used to connect to a traditional speaker.
  • Figure 3 shows an example of the composition of an intelligent voice sticker in this application.
  • the voice device implemented as a smart voice sticker 300 may include a MIC 310, a communication module implemented as a Bluetooth and/or infrared short-range communication module (BT/IR) 320, a battery 330, and a speaker (SPK) 340.
  • the smart voice sticker 300 may also have an attachment structure suitable for attaching to any suitable attachment surface (as shown in FIG. 3), for example.
  • the communication module 320 may also include a Zigbee communication module.
  • the MIC 310 converts the received user voice into an electric signal, and sends the above-mentioned electric signal carrying the user's voice information to the BT/IR module 320.
  • the BT/IR module 320 sends the user's voice input data to the processing device, so as to use the processing device and the cloud to realize semantic recognition and corresponding operation command generation. Subsequently, the BT/IR module 320 may also obtain the content data that the cloud desired voice sticker 300 outputs to the user through the processing device, for example, data for further interaction with the user or reporting the operation result.
  • the BT/IR module 320 may send the above-mentioned electric signal including the cloud content information to the SPK 340, and the latter performs an electro-acoustic conversion and voice report to the user.
  • TTS speech synthesis
  • the cloud can directly issue the data via the TTS
  • the processing device or the smart voice sticker 300 can include the aforementioned TTS module.
  • the processing device performs voice synthesis based on the content issued by the cloud, and then transmits the signal containing the above-mentioned voice synthesis to the voice sticker 300 .
  • the BT/IR module 320 transmits the above-mentioned information as an electrical signal for the SPK to directly perform electro-acoustic conversion.
  • FIG. 4 shows an example of the circuit processing flow of a voice device of the present application.
  • the microphone array receives external voice input (for example, the user's voice), and wake-up word recognition is performed by a DSP (digital signal processor) with AEC (Echo Cancellation) function.
  • DSP digital signal processor
  • AEC Echo Cancellation
  • the collected voice is sent via the system-on-chip with WiFi function or its integrated infrared or Bluetooth (or BLE/mesh), and the subsequent voice output data (for example, command execution statement or interaction problem) .
  • the received voice data is then processed by the codec (codec) for TTS processing, and then output through the speaker.
  • the power supply supplies energy for the various functional modules of the voice device.
  • FIG. 5 shows a schematic diagram of the composition of a voice kit according to an embodiment of the present application.
  • the voice kit 500 may include the voice device 510 and the processing device 520 described above with reference to FIGS. 1 to 4.
  • the processing device 520 includes a communication unit 521 communicatively connected to the voice device 510, and the processing device 520 receives the voice data collected by the voice device via its communication unit 521, so as to realize the semantic recognition of the voice input and the corresponding semantic recognition Operation of the target device.
  • the communication unit 521 may be adapted to communicate in a communication mode corresponding to the communication module 511 of the small voice device 510, such as low-power short-distance communication.
  • the aforementioned communication unit 521 is adapted to communicate with the corresponding communication module 511 with Bluetooth, Zigbee and/or infrared technology.
  • the processing device 520 may include a networking unit 522 for uploading the voice data from the user received from the voice device 510 to a server in the cloud.
  • the networking unit 522 is, for example, a module that uses WiFi and/or mobile communication technologies such as 4G and 5G to access the Internet.
  • the server may perform semantic recognition for the voice input to generate and issue operation commands for the operation of the target device corresponding to the recognition semantics.
  • the server may include: a local server that communicates with the voice kit in close range, the local server is used to perform semantic recognition for at least part of the voice input, and generate operation commands for recognizing the operation of the target device corresponding to the semantics And issue the operation command.
  • the local server can be a smart speaker used as a home smart processing terminal. As a result, the processing speed of voice commands is improved through the local server.
  • the above-mentioned local server can be connected to a cloud server to form the "server" in this application as a whole.
  • the kit may include multiple voice devices arranged in different areas, and each processing device is communicatively connected with one or more of the miniaturized devices within the communication range.
  • each processing device is communicatively connected with one or more of the miniaturized devices within the communication range.
  • different voice stickers may be arranged in different rooms (for example, living room, bedroom, bathroom, and kitchen).
  • the multiple voice stickers 510 may be connected to one processing device 520 within the Bluetooth communication range.
  • the only processing device 520 in the kit realizes networking with the cloud.
  • multiple processing devices may be included in the kit to be responsible for a larger number of voice stickers.
  • the processing device 520 may also be used as an external module of the target device (for example, a smart home appliance) to perform networking operations for the target device itself.
  • the user's voice input can control the target device based on different approaches.
  • the target device may directly receive the issued operation command from the server and execute the operation corresponding to the operation command; and/or the processing device 520 receives the operation issued by the server via its networking unit 522 Commands are issued to the target device by itself or via the voice device 510.
  • the target device that directly receives the issued operation command from the server and executes it may include its own networked smart home appliance.
  • the target device that obtains the operation command via the processing device 520 or the voice device 510 may include traditional home appliances.
  • the control server can generate corresponding operation commands based on semantic recognition (for example, turn off the air conditioner), find the infrared operation code of the air conditioner, and send the above commands directly to the above-mentioned universal infrared remote control Voice device of the device.
  • the encoding of the above infrared operation may also be implemented locally, for example, implemented at a processing device or a voice device.
  • the processing device 520 can also use its networking unit 522 to receive the interactive content with the user issued by the server, and the communication unit 521 is also used to send the interactive content with the user.
  • the content of the above interaction can be the confirmation of the operation of the device (for example, "the light is turned on"), the acquisition of the necessary semantic elements (for example, the recognition of the user's "light on” voice input and there is more than one light in the range In this case, further ask “Which light to turn on”), or a combination of the above two (for example, "The TV is turned on, which channel needs to be watched), etc.
  • the processing device 520 itself may have simple voice recognition and command generation and issuance functions.
  • the processing device 520 may include: a voice recognition unit for performing semantic recognition for the voice input; and an operation command generating unit for generating operation commands for recognizing the operation of the target device corresponding to the semantics.
  • the processing device 520 itself may be a smart speaker connected to a cloud server, or another device that also has a voice collection function.
  • the voice device 510 can be used as an entrance to help the smart speaker to collect voices in different areas (for example, kitchen or bathroom areas where it is not convenient to directly perform voice operations on the smart speakers arranged in the living room).
  • Fig. 6 shows a schematic diagram of the composition of a voice control system according to an embodiment of the present application.
  • the system 600 may include a plurality of voice packages 610 and server 620 as described above.
  • the server 620 may refer to a group of servers that provide specific functions.
  • Each voice kit 610 may include at least one voice device and at least one processing device, for example, the illustrated three voice devices and one processing device, and the voice device and the processing device are connected via short-distance low-power communication means (for example, , The illustrated BT) to communicate.
  • Each voice device can be used for voice collection and transmission in different areas, for example.
  • Each voice kit 610 is connected to the server 620 through the networking function of the processing device (for example, a WiFi module).
  • the server 620 may perform semantic recognition on the voice input uploaded by the processing device to generate and issue operation commands for recognizing the operation of the target device corresponding to the semantics.
  • the server itself can implement all operations such as semantic recognition, operation command generation and issuance, etc.
  • the server 620 may include: a semantic processing server for semantically recognizing the uploaded voice input; a command generating server for generating operation commands for target recognition operations based on the recognized semantics; and a server for issuing operation commands The command is issued to the server.
  • the server 620 may be used only for semantic recognition, or for generating and issuing operation commands for some target devices. At least the control of the target device in this application can be implemented by an external server. This is especially applicable when a service provider of a certain brand provides remote control functions for their smart devices.
  • the server 620 may include a semantic processing server for semantic recognition of the uploaded voice input, and the server sends the recognized semantics to an external server, wherein the external server is based on The recognized semantics generates a command generation server for the operation command for the target recognition operation, and the command issuance server for issuing the operation command.
  • the server 620 may obtain in advance at least one piece of local device configuration information as follows.
  • the local device configuration information may include: the distribution and device information of the voice device, the processing device and/or at least part of the target device itself; and the correspondence between at least two of the voice device, the processing device and the target device.
  • the server 620 can also automatically complete the semantic elements that are missing in the recognition semantics for performing operations on the target device based on the local device configuration information. For example, based on the location of the bathroom where the voice sticker is located, it is clear that "turning on the light" is the only ceiling light in the bathroom.
  • Fig. 7 shows a schematic flowchart of a voice control method according to an embodiment of the present application. The method can be implemented by the above voice device, kit and system.
  • step S710 the voice device (for example, a miniaturized voice sticker) collects voice input.
  • step S720 the voice device sends the voice input to the processing device.
  • the above-mentioned transmission may be, for example, short-distance communication based on infrared, Bluetooth and/or Zigbee.
  • step S730 the processing device implements semantic recognition for the voice input and generation of corresponding target device operation commands.
  • the foregoing operations for semantic recognition and command generation can be completed by different objects, for example, can be completed by the processing device itself, a local server, a remote server, or any combination thereof.
  • the processing device may send the voice input to the server, and use the semantic recognition of the voice input by the server to obtain the operation command of the target device operation corresponding to the semantic recognition.
  • the server may include a local server, and the local server may perform semantic recognition for at least part of the voice input, generate operation commands for identifying the operation of the target device corresponding to the semantics, and issue all The operation command.
  • step S730 the processing device performs semantic recognition for at least part of the voice input, and generates an operation command for recognizing the operation of the target device corresponding to the semantics.
  • the voice control method of the present application may further include: using the voice device and/or the processing device to obtain an operation command for identifying the operation of the target device corresponding to the semantics; and the voice device and/or the processing device The processing device issues the operation command to the target device.
  • the voice control method of the present application may further include: the processing device obtains and delivers the voice output content; the processing device delivers the voice output content to the voice device; and the voice device voice Output the voice output content.
  • the voice control method of the present application may further include: the voice device recognizes the wake-up word from the voice input from the user. Therefore, step S720 may include: sending the collected voice input to the processing device after the wake word is recognized.
  • the voice control method of the present application may further include: a server and/or an external server that has acquired the semantic recognition generates and issues operations of the target device operation corresponding to the semantic recognition based on the semantic recognition command.
  • the voice control method of the present application may further include: the target device receives an operation command issued by the server and/or the external server and executes an operation corresponding to the operation command.
  • the target device receiving the operation command issued by the server and/or the external server and performing the operation corresponding to the operation command may include at least one of the following: the target device receives the operation command from the service Terminal or the external server directly receives the issued operation command; the processing device receives the operation command issued by the server via the networking unit, and issues the issued operation command to the target device by itself or via the voice device The operation command.
  • the voice control method of the present application may further include: the server obtains the execution result of the operation command; the server generates the statement content of the execution command based on the execution result and sends it to The processing device; the processing device issues the statement content of the execution command to the voice device; and the voice device outputs the statement content of the execution command by voice.
  • the voice control method of the present application may further include: the server generates content interacting with the user based on the semantic recognition and sends it to the processing device; and the processing device sends the content to the voice The device delivers the content of interaction with the user; and the voice device vocally outputs the content of interaction with the user.
  • the voice control method of the present application may further include: the server automatically fills in the semantic elements that are missing in the operation on the target device in the recognition semantics based on the local device configuration information obtained in advance.
  • the server involved in the voice control method of the present application may be a local server, so the local server may perform semantic recognition for at least part of the voice input to generate target device operations corresponding to the recognition semantics And issue the operation command.
  • FIG 8 shows an example of voice control in this application.
  • the voice device in the voice kit detects voice command input from the user or smart device, and the processing device Preliminary processing can be performed on the collected voice, for example, ASR (speech recognition) pickup, and the picked up voice command is transmitted to the cloud.
  • the server can perform subsequent processing on the picked-up voice commands, such as NIP (Natural Voice Processing) and NIU (Natural Voice Understanding), and perform command analysis and TTS output according to the processing results.
  • NIP Natural Voice Processing
  • NIU Natural Voice Understanding
  • the parsed command can be directly transmitted to the target device for execution (for example, the smart device directly executes the command parsed by the cloud), or it can be executed by the target device after conversion by a processing device or a voice device, for example Commands (for example, infrared commands issued by a voice device for traditional household appliances).
  • the voice device can perform voice output through its own speaker or externally connected Bluetooth or traditional speakers.
  • the voice device, voice kit, and voice control system according to the present application have been described in detail above with reference to the accompanying drawings.
  • the existing technology that incorporates the voice module into the device has a long development cycle and high cost, and requires acoustic design and debugging for each device.
  • the modular solution of the present application can be easily integrated into any required equipment.
  • the kit of the present application is convenient to use, and the miniaturized and portable design of the voice module makes it easy to be placed on various surfaces and suitable for use in multiple scenarios.
  • the smart voice sticker of the present application can also realize the function of a universal infrared remote control by adding an infrared module, thereby realizing intelligent control of traditional infrared devices.
  • the method according to the present application can also be implemented as a computer program or computer program product.
  • the computer program or computer program product includes computer program code instructions for executing the above-mentioned steps defined in the above-mentioned method of the present application.
  • this application can also be implemented as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium), on which executable code (or computer program, or computer instruction code) is stored ), when the executable code (or computer program, or computer instruction code) is executed by the processor of the electronic device (or computing device, server, etc.), the processor is caused to execute each of the above methods according to the application step.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logical function.
  • Executable instructions may also occur in a different order than marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.

Abstract

The present application discloses a voice control system and method, and a voice suite and a voice apparatus. The system comprises a plurality of voice suites and a server in communication with the voice suites, wherein each voice suite comprises: a voice apparatus, configured to acquire voice input and send the acquired voice input to a processing apparatus; and a processing apparatus, configured to receive the voice input acquired by the voice apparatus, and send the voice input to a server. The server is configured to perform semantic recognition on the voice input sent by the processing apparatus, so as to generate and issue an operation command of a target device operation corresponding to the recognized semantic. Thus, using a discrete voice apparatus as a voice acquisition entrance, and implementing, by means of a processing apparatus and a server, semantic parsing and command issuing facilitates flexible voice control of various devices, especially various smart home devices and even legacy devices.

Description

语音控制系统和方法、以及语音套件和语音装置Voice control system and method, voice kit and voice device
本申请要求2019年04月25日递交的申请号为201910339913.9、发明名称为“语音控制系统和方法、以及语音套件和语音装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on April 25, 2019 with the application number 201910339913.9 and the invention title "voice control system and method, and voice kit and voice device", the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本申请涉及信息技术领域,尤其涉及一种语音控制系统和方法、以及语音套件和语音装置。This application relates to the field of information technology, and in particular to a voice control system and method, as well as a voice kit and a voice device.
背景技术Background technique
随着智能家居技术的普及与发展,用语音实现对各类家用电器的控制业已成为标配。在现有的技术中,语音控制可以通过作为家中控制节点的智能音箱或是自身带有语音交互功能的电器本身来实现。With the popularization and development of smart home technology, the use of voice to control various household appliances has become standard. In the existing technology, the voice control can be realized by a smart speaker as a control node in the home or an electrical appliance itself with a voice interaction function.
然而,现有的智能音箱的体积和功耗较大,通常难以实现电池供电。因此,电源线的存在、较大的体积以及对外界环境的要求(例如,不适于放置在湿度较大的卫生间等)都限制了智能音箱的灵活布置和使用。However, the volume and power consumption of existing smart speakers are relatively large, and it is usually difficult to achieve battery power supply. Therefore, the existence of the power cord, the large size, and the requirements for the external environment (for example, not suitable for placing in a bathroom with high humidity, etc.) restrict the flexible arrangement and use of smart speakers.
带有语音交互功能的设备则通常配备有专门的语音处理模组。这些语音处理模组只能布置在设备内部,一方面需要开模和单独调试成本高,另一方面受设备装配的制约,不能做到灵活放置,适应性不强。Devices with voice interaction functions are usually equipped with specialized voice processing modules. These voice processing modules can only be arranged inside the device. On the one hand, the cost of mold opening and separate debugging is high. On the other hand, due to the constraints of the equipment assembly, they cannot be placed flexibly and have poor adaptability.
为此,需要一种更为灵活且易于实现的语音方案。For this reason, a more flexible and easy-to-implement voice solution is needed.
发明内容Summary of the invention
为了解决上述至少一个问题,本申请提出了一种分立的语音装置作为智能语音的入口。语音装置采集语音输入,并将其发送至处理装置,后者在经由本地或云端实现语义解析和对应命令下发,由此方便对各类设备,尤其是各类智能家居设备甚至是传统设备的灵活控制。In order to solve at least one of the above-mentioned problems, this application proposes a separate voice device as an entrance to intelligent voice. The voice device collects voice input and sends it to the processing device, which implements semantic analysis and corresponding command issuance locally or in the cloud, thus facilitating various types of equipment, especially various smart home equipment and even traditional equipment. Flexible control.
根据本申请的一个方面,提出了一种语音控制系统,包括多个语音套件以及与所述语音套件通信的服务端,其中,所述语音套件包括:语音装置,用于采集语音输入并将采集到的语音输入发送给处理装置;处理装置,用于接收所述语音装置采集到的语音输入,并将所述语音输入发送至服务端,并且所述服务端,用于对所述处理装置发送的所 述语音输入进行语义识别,以生成并下发识别语义所对应的目标设备操作的操作命令。其中,语音套件可以包括布置在不同区域内的多个所述语音装置,并且每个处理装置与通信范围内的一个或多个所述语音装置通信连接。由此,通过对语音采集入口的方便布置,提升语音控制的灵活度和覆盖范围。According to one aspect of the present application, a voice control system is proposed, including a plurality of voice kits and a server communicating with the voice kit, wherein the voice kit includes: a voice device for collecting voice input and collecting The received voice input is sent to the processing device; the processing device is used to receive the voice input collected by the voice device and send the voice input to the server, and the server is used to send to the processing device Perform semantic recognition on the voice input to generate and issue operation commands for the operation of the target device corresponding to the recognition semantics. The voice kit may include a plurality of the voice devices arranged in different areas, and each processing device is communicatively connected with one or more of the voice devices within a communication range. Thus, through the convenient arrangement of the voice collection entrance, the flexibility and coverage of voice control are improved.
优选地,所述语音装置和所述处理装置之间进行短距离通信,例如各自包括彼此进行本地短距离通信的低功耗短距离通信模组,所述通信模组包括如下至少一种:基于蓝牙技术与所述处理装置通信的蓝牙通信模组。基于红外技术与所述处理装置通信的红外通信模组;以及基于Zigbee技术与所述处理装置通信的Zigbee通信模组。Preferably, the voice device and the processing device perform short-distance communication, for example, each includes a low-power short-distance communication module for local short-distance communication with each other, and the communication module includes at least one of the following: A Bluetooth communication module that communicates with the processing device using Bluetooth technology. An infrared communication module that communicates with the processing device based on infrared technology; and a Zigbee communication module that communicates with the processing device based on Zigbee technology.
服务端可以实现为远程服务端,于是处理装置可以包括与所述服务端进行远程通信的WiFi模组。The server can be implemented as a remote server, so the processing device can include a WiFi module for remote communication with the server.
优选地,所述语音装置从采集到的语音输入中识别出唤醒词,在所述唤醒模组识别出唤醒词之后将采集到的语音输入发送给所述处理装置。Preferably, the voice device recognizes a wake-up word from the collected voice input, and sends the collected voice input to the processing device after the wake-up module recognizes the wake-up word.
优选地,语音装置经由自身扬声器或有线/无线连接的外接音箱,对从所述处理装置和/或目标设备接收到的内容进行语音输出。进行语音输出的内容可以包括如下至少一项:对执行命令的陈述内容;以及与用户的交互内容,例如,进一步获取缺失的语义要素。Preferably, the voice device performs voice output on the content received from the processing device and/or the target device via its own speaker or a wired/wireless external speaker. The content of the voice output may include at least one of the following: the statement content of the execution command; and the interaction content with the user, for example, further obtaining the missing semantic elements.
优选地,所述语音装置还用于对所述处理装置接收到的命令进行红外输出,以实现对与所述命令相对应的目标设备的操作。Preferably, the voice device is also used to perform infrared output on the command received by the processing device, so as to realize the operation of the target device corresponding to the command.
优选地,所述语音装置还包括用于附着于墙面、处理装置、目标设备、或其他设施表面的附着机构。Preferably, the voice device further includes an attachment mechanism for attaching to the wall, the processing device, the target device, or the surface of other facilities.
优选地,所述处理装置用于针对至少部分所述语音输入进行语义识别,并生成识别语义所对应的目标设备操作的操作命令。Preferably, the processing device is configured to perform semantic recognition for at least part of the voice input, and generate operation commands for recognizing the operation of the target device corresponding to the semantics.
在本申请的语音控制系统中,目标设备可以直接从所述服务端接收下发的操作命令并执行与所述操作命令相对应的操作;和/或所述处理装置可以接收所述服务端下发的操作命令,自行或经由所述语音装置向所述目标设备下发所述操作命令。相应地,直接从所述服务端接收下发的操作命令并执行的所述目标设备可以包括联网的智能家电设备;并且经由所述处理装置或所述语音装置获取操作命令的所述目标设备包括传统家电设备。In the voice control system of the present application, the target device may directly receive the issued operation command from the server and execute the operation corresponding to the operation command; and/or the processing device may receive the download from the server The issued operation command is issued to the target device by itself or via the voice device. Correspondingly, the target device that directly receives and executes the issued operation command from the server may include a networked smart home appliance; and the target device that obtains the operation command via the processing device or the voice device includes Traditional home appliances.
优选地,所述服务端可以包括:用于对上传的语音输入进行语义识别的语义处理服务端;基于所述识别的语义生成针对目标识别操作的操作命令的命令生成服务端;以及 下发所述操作命令的命令下发服务端。Preferably, the server may include: a semantic processing server for semantic recognition of uploaded voice input; a command generation server for generating operation commands for target recognition operations based on the recognized semantics; and issuing The command of the operation command is issued to the server.
作为替换或者补充,针对部分目标设备的命令生成和下发可由外部服务端完成。于是,所述服务端可以包括:用于对上传的语音输入进行语义识别的语义处理服务端,并且所述服务端将所述识别出的语义发送给外部服务端,其中所述外部服务端基于所述识别的语义生成针对目标识别操作的操作命令的命令生成服务端,以及下发所述操作命令的命令下发服务端。As an alternative or supplement, the command generation and issuance for some target devices can be completed by an external server. Thus, the server may include a semantic processing server for semantic recognition of the uploaded voice input, and the server sends the recognized semantics to an external server, wherein the external server is based on The recognized semantics generates a command generation server for the operation command for the target recognition operation, and the command issuance server for issuing the operation command.
优选地,所述服务端可以预先获取如下至少一项本地设备配置信息:所述语音装置、所述处理装置和/或至少部分目标设备自身的分布和设备信息;所述语音装置、所述处理装置和目标设备中至少两者之间的对应关系。Preferably, the server may obtain in advance at least one piece of local device configuration information: the distribution and device information of the voice device, the processing device, and/or at least part of the target device; the voice device, the processing device The correspondence between at least two of the device and the target device.
优选地,所述服务端基于所述本地设备配置信息自动补齐识别语义中执行针对目标设备的操作所缺失的语义要素。Preferably, the server automatically fills in semantic elements that are missing in the recognition semantics for performing operations on the target device based on the local device configuration information.
优选地,服务端可以包括:与所述语音套件近距离通信的本地服务端,所述本地服务端可用于针对至少部分所述语音输入进行语义识别,生成识别语义所对应的目标设备操作的操作命令并下发所述操作命令。Preferably, the server may include: a local server that communicates with the voice kit in close range, and the local server may be used to perform semantic recognition for at least part of the voice input, and generate operations for recognizing the operation of the target device corresponding to the semantics Command and issue the operation command.
根据本申请的另一个方面,提出了一种语音套件,包括:语音装置,用于采集语音输入并将采集到的语音输入经由本地通信发送给处理装置;以及处理装置,包括与所述语音装置通信连接的通信单元,所述处理装置经由所述通信单元接收所述语音装置采集到的语音数据,以实现针对所述语音输入的语义识别以及识别语义所对应的目标设备操作。According to another aspect of the present application, a voice kit is provided, including: a voice device for collecting voice input and sending the collected voice input to a processing device via local communication; and a processing device, including a voice device A communication unit connected in communication, the processing device receives voice data collected by the voice device via the communication unit, so as to realize the semantic recognition of the voice input and the target device operation corresponding to the recognition semantics.
优选地,所述语音装置和所述处理装置可以各自包括彼此进行近距离通信的低功耗短距离通信模组,所述通信模组包括如下至少一种:基于蓝牙技术与所述处理装置通信的蓝牙通信模组;基于红外技术与所述处理装置通信的红外通信模组;以及基于Zigbee技术与所述处理装置通信的Zigbee通信模组。Preferably, the voice device and the processing device may each include a low-power short-range communication module for performing short-range communication with each other, and the communication module includes at least one of the following: communicating with the processing device based on Bluetooth technology Bluetooth communication module; infrared communication module based on infrared technology to communicate with the processing device; and Zigbee communication module based on Zigbee technology to communicate with the processing device.
所述处理装置可以包括:联网单元,用于将从所述语音装置接收到的来自用户的语音数据上传至服务端,其中,所述服务端进行针对所述语音输入的语义识别,以生成并下发识别语义所对应的目标设备操作的操作命令。由此,通过将联网装置改为布置在处理装置内,在确保语音贴小型化的同时利用云端实现智能AI语音处理。The processing device may include: a networking unit for uploading voice data from the user received from the voice device to a server, wherein the server performs semantic recognition for the voice input to generate and Issue operation commands that identify the operation of the target device corresponding to the semantics. Therefore, by changing the networked device to the processing device, the cloud can be used to realize intelligent AI voice processing while ensuring the miniaturization of the voice stickers.
优选地,目标设备可以直接从所述服务端接收下发的操作命令并执行与所述操作命令相对应的操作;和/或所述处理装置经由所述联网单元接收所述服务端下发的操作命令,自行或经由所述语音装置向所述目标设备下发所述操作命令。由此,可以根据不同 的应用场景实现云端下发命令的不同执行方式:直接控制家电、AI模组控制、语音贴控制等。具体地,直接从所述服务端接收下发的操作命令并执行的目标设备可以包括自身联网的智能家电设备;并且经由所述处理装置或所述语音装置获取操作命令的目标设备包括传统家电设备。Preferably, the target device can directly receive the issued operation command from the server and execute the operation corresponding to the operation command; and/or the processing device receives the issued operation command from the server via the networking unit The operation command is issued to the target device by itself or via the voice device. In this way, different execution methods of cloud-issued commands can be implemented according to different application scenarios: direct control of home appliances, AI module control, voice paste control, etc. Specifically, the target device that directly receives the issued operation command from the server and executes it may include its own networked smart home appliance; and the target device that obtains the operation command via the processing device or the voice device includes a traditional home appliance. .
进一步地,联网单元可以还用于:接收所述服务端下发的与用户的交互内容,以及所述通信单元还用于:将所述与用户的交互内容发送给所述语音装置,以供语音输出。由此,进一步完善语音贴的语音入口功能。Further, the networking unit may be further used for: receiving the interactive content with the user issued by the server, and the communication unit is also used for: sending the interactive content with the user to the voice device for Voice output. As a result, the voice entry function of voice stickers is further improved.
根据不同的应用场景,套件中的语音装置和处理装置可以自由组合,例如套件可以包括布置在不同区域内的多个所述语音装置,并且每个处理装置与通信范围内的一个或多个所述小型化设备通信连接。According to different application scenarios, the voice devices and processing devices in the kit can be freely combined. For example, the kit may include multiple voice devices arranged in different areas, and each processing device is connected to one or more devices in the communication range. The communication connection of miniaturized equipment.
作为替换或者补充,处理装置还可以包括:语音识别单元,用于针对所述语音输入进行语义识别;以及操作命令生成单元,用于生成识别语义所对应的目标设备操作的操作命令。由此可以针对简单命令实现快速的本地处理。As an alternative or supplement, the processing device may further include: a voice recognition unit for performing semantic recognition for the voice input; and an operation command generating unit for generating operation commands for recognizing the operation of the target device corresponding to the semantics. This enables fast local processing of simple commands.
根据本申请的一个方面,提出了一种语音装置,包括:麦克风,用于采集语音输入;通信模组,用于将采集到的语音输入发送给处理装置,以实现针对所述语音输入的语义识别以及识别语义所对应的目标设备操作。由此,通过仅保留最为简单的语义采集和通信功能的语音贴作为语音输入的入口,实现对语音功能模块的灵活布置以及对语音的方便采集。进一步地,该装置可以包括用于附着于墙面、处理装置、目标设备、或其他设施表面的附着机构,例如,具有智能语音贴的形式。According to one aspect of the present application, a voice device is proposed, which includes: a microphone for collecting voice input; a communication module for sending the collected voice input to a processing device, so as to realize the semantics of the voice input Identify and identify the target device operations corresponding to the semantics. As a result, by retaining only the simplest semantic collection and communication function voice stickers as the entrance of voice input, flexible arrangement of voice function modules and convenient voice collection are realized. Further, the device may include an attachment mechanism for attaching to a wall, a processing device, a target device, or the surface of other facilities, for example, in the form of a smart voice sticker.
优选地,该装置还可以包括:唤醒模组,用于从来自用户的语音输入中识别出唤醒词,并且所述通信模组用于在所述唤醒模组识别出唤醒词之后将采集到的语音输入发送给所述处理装置。该唤醒模组可由现有的低功耗小型化DSP实现,由此在确保低功耗和小型化特性的同时提供远场唤醒功能。Preferably, the device may further include: a wake-up module for recognizing a wake-up word from a voice input from a user, and the communication module is used for collecting the wake-up word after the wake-up module recognizes the wake-up word The voice input is sent to the processing device. The wake-up module can be implemented by an existing low-power miniaturized DSP, thereby providing a far-field wake-up function while ensuring low power consumption and miniaturization.
优选地,该装置还可以包括:扬声器,用于对所述通信模组从所述处理装置接收到的内容进行语音输出。所述扬声器进行语音输出的内容包括如下至少一项:对执行命令的陈述内容;以及与用户的交互内容,例如,进一步获取缺失的语义要素。由此,进一步完成上述装置的语音入口功能。Preferably, the device may further include: a speaker for voice output of the content received by the communication module from the processing device. The content of the voice output by the speaker includes at least one of the following: statement content for executing the command; and interactive content with the user, for example, further obtaining missing semantic elements. Thus, the voice portal function of the above device is further completed.
优选地,该装置还可以包括:外部音箱连接模组,以使得连接的外部音箱对所述通信模组从所述处理装置接收到的内容进行语音输出。所述外部音箱连接模组是如下至少一项:包括蓝牙连接功能以连接外部蓝牙音箱的所述通信模组;以及包括音频插孔的外 部音箱有线连接模组。由此,该装置可以与现有的蓝牙或传统音箱结合,使得上述音箱通过接入语音贴后变为智能设备,可以实现语音控制及云端资源接入。Preferably, the device may further include: an external speaker connection module, so that the connected external speaker performs voice output on the content received by the communication module from the processing device. The external speaker connection module is at least one of the following: the communication module including a Bluetooth connection function to connect an external Bluetooth speaker; and an external speaker wired connection module including an audio jack. As a result, the device can be combined with existing Bluetooth or traditional speakers, so that the aforementioned speakers can become smart devices after being connected to voice stickers, which can realize voice control and cloud resource access.
优选地,该装置还可以包括:红外模组,用于对所述通信模组从所述处理装置接收到的命令进行红外输出,以实现对与所述命令相对应的目标设备的操作。由此,该装置可以被复用为针对传统家电的万能红外遥控器。Preferably, the device may further include: an infrared module, which is used to perform infrared output on the command received by the communication module from the processing device, so as to realize the operation of the target device corresponding to the command. Thus, the device can be multiplexed as a universal infrared remote control for traditional home appliances.
优选地,通信模组是低功耗短距离通信模组,例如基于蓝牙技术与所述处理装置通信的蓝牙通信模组,或是基于红外技术与所述处理装置通信的红外通信模组,或是基于Zigbee技术与所述处理装置通信的Zigbee通信模组。由此为小型化和低功耗通信提供可能。Preferably, the communication module is a low-power short-range communication module, such as a Bluetooth communication module that communicates with the processing device based on Bluetooth technology, or an infrared communication module that communicates with the processing device based on infrared technology, or It is a Zigbee communication module based on Zigbee technology to communicate with the processing device. This provides the possibility for miniaturization and low-power communication.
优选地,该装置还可以包括:供电模组,所述供电模组包括如下至少一项:无线充电组件;电池组件;USB插口。本装置的低功耗特性可以去除对电源线的需要,由此进一步提升其便携性。Preferably, the device may further include: a power supply module, the power supply module includes at least one of the following: a wireless charging assembly; a battery assembly; and a USB socket. The low power consumption of the device can eliminate the need for a power cord, thereby further improving its portability.
根据本申请的再一个方面,提出了一种语音控制方法,该方法可由如上的语音装置、套件和系统实施,并且包括:语音装置采集语音输入;语音装置将语音输入发送给处理装置;所述处理装置实现针对所述语音输入的语义识别以及对应目标设备操作命令生成。According to another aspect of the present application, a voice control method is proposed, which can be implemented by the above voice device, kit and system, and includes: the voice device collects voice input; the voice device sends the voice input to the processing device; The processing device realizes semantic recognition for the voice input and generation of corresponding target device operation commands.
优选地,所述语音装置将所述语音输入发送给处理装置包括:所述语音装置经由近距离通信将所述语音输入发送给处理装置。Preferably, the voice device sending the voice input to the processing device includes: the voice device sending the voice input to the processing device via short-range communication.
优选地,所述处理装置实现针对所述语音输入的语义识别以及对应目标设备操作命令生成包括:所述处理装置将所述语音输入上传至服务端;所述服务端对所述语音输入的语义识别,以获取识别语义所对应的目标设备操作的操作命令。Preferably, the processing device that implements semantic recognition for the voice input and the generation of the corresponding target device operation command includes: the processing device uploads the voice input to the server; the server performs the semantics of the voice input Recognition to obtain the operation command of the target device operation corresponding to the recognition semantics.
优选地,所述处理装置将所述语音输入上传至服务端包括:所述处理装置将所述语音输入经近距离通信上传至本地服务端。Preferably, the processing device uploading the voice input to the server includes: the processing device uploads the voice input to the local server via short-range communication.
优选地,所述服务端对所述语音输入的语义识别,以获取识别语义所对应的目标设备操作的操作命令包括:所述本地服务端针对至少部分所述语音输入进行语义识别,生成识别语义所对应的目标设备操作的操作命令并下发所述操作命令。Preferably, the semantic recognition of the voice input by the server to obtain the operation command of the target device operation corresponding to the recognition semantics includes: the local server performs semantic recognition for at least part of the voice input to generate recognition semantics The operation command corresponding to the operation of the target device is issued and the operation command is issued.
优选地,所述处理装置实现针对所述语音输入的语义识别以及对应目标设备操作命令生成包括:所述处理装置针对至少部分所述语音输入进行语义识别,并生成识别语义所对应的目标设备操作的操作命令。Preferably, the processing device that implements semantic recognition for the voice input and the generation of corresponding target device operation commands includes: the processing device performs semantic recognition for at least part of the voice input, and generates the target device operation corresponding to the recognition semantics The operation command.
优选地,该方法还可以包括:使用所述语音装置和/或所述处理装置获取识别语义所对应的目标设备操作的操作命令;以及所述语音装置和/或所述处理装置向所述目标设备 下发所述操作命令。Preferably, the method may further include: using the voice device and/or the processing device to obtain an operation command for identifying the operation of the target device corresponding to the semantics; and the voice device and/or the processing device send the voice device and/or the processing device to the target The device issues the operation command.
优选地,该方法还可以包括:所述处理装置获取下发语音输出内容;所述处理装置向所述语音装置下发所述语音输出内容;以及所述语音装置语音输出所述语音输出内容。Preferably, the method may further include: the processing device obtains and delivers the voice output content; the processing device delivers the voice output content to the voice device; and the voice device outputs the voice output content by voice.
本申请的外置和分立的模组化方案可以方便的集成至任何需要的设备。另外,本申请的套件使用方便,通过语音模组的小型化、便携化设计,使之可以方便地安放在各类表面上使用,适合多场景使用。进一步地,本申请的智能语音贴还可以通过附加红外模组实现万能红外遥控器功能,从而实现对传统红外设备的智能控制。The external and discrete modular solutions of this application can be easily integrated into any required equipment. In addition, the kit of the present application is convenient to use, and the miniaturized and portable design of the voice module makes it easy to be placed on various surfaces and suitable for use in multiple scenarios. Furthermore, the smart voice sticker of the present application can also realize the function of a universal infrared remote control by adding an infrared module, thereby realizing intelligent control of traditional infrared devices.
附图说明Description of the drawings
通过结合附图对本公开示例性实施方式进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显,其中,在本公开示例性实施方式中,相同的参考标号通常代表相同部件。Through a more detailed description of the exemplary embodiments of the present disclosure in conjunction with the accompanying drawings, the above and other objectives, features, and advantages of the present disclosure will become more apparent. Among them, in the exemplary embodiments of the present disclosure, the same reference numerals generally represent The same parts.
图1示出了根据本申请一个实施例的语音装置的组成示意图;Fig. 1 shows a schematic diagram of the composition of a speech device according to an embodiment of the present application;
图2示出了实现为智能语音贴的一个例子;Figure 2 shows an example implemented as a smart voice sticker;
图3示出了本申请一个智能语音贴的组成例;Figure 3 shows an example of the composition of an intelligent voice sticker in this application;
图4示出了本申请一个语音装置的电路处理流程例;Figure 4 shows an example of the circuit processing flow of a voice device of the present application;
图5示出了根据本申请一个实施例的语音套件的组成示意图;FIG. 5 shows a schematic diagram of the composition of a voice kit according to an embodiment of the present application;
图6示出了根据本申请一个实施例的语音控制系统的组成示意图;Fig. 6 shows a schematic diagram of the composition of a voice control system according to an embodiment of the present application;
图7示出了根据本申请一个实施例的语音控制方法的示意性流程图;Fig. 7 shows a schematic flowchart of a voice control method according to an embodiment of the present application;
图8示出了本申请一个语音控制的例子。Figure 8 shows an example of voice control in this application.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的优选实施方式。虽然附图中显示了本公开的优选实施方式,然而应该理解,可以以各种形式实现本公开而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。Hereinafter, preferred embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to make the present disclosure more thorough and complete, and to fully convey the scope of the present disclosure to those skilled in the art.
如上所述,现有的智能家居语音交互方案(例如,智能音箱以及带有语音交互功能的设备)都存在灵活度不高、使用场景受限等问题。为此,提出了一种分立的语音装置作为智能语音的入口。语音装置采集语音输入,并经由本地通信将其发送至处理装置,后者在经由本地或云端实现语义解析和对应命令下发,由此方便对各类设备,尤其是各 类智能家居设备甚至是传统设备的灵活控制。As mentioned above, the existing smart home voice interaction solutions (for example, smart speakers and devices with voice interaction functions) have problems such as low flexibility and limited use scenarios. For this reason, a separate voice device is proposed as the entrance of intelligent voice. The voice device collects voice input and sends it to the processing device via local communication. The latter implements semantic analysis and corresponding command issuance via local or cloud, thus facilitating various types of equipment, especially various smart home equipment and even Flexible control of traditional equipment.
图1示出了根据本申请一个实施例的语音装置的组成示意图。如图1所示,语音装置100可以包括麦克风(MIC)110和通信模组120。MIC 110用于采集语音输入。通信模组120则用于将采集到的语音输入发送给处理装置,以基于该处理装置实现针对所述语音输入的语义识别以及识别语义所对应的目标设备操作。如下所述,目标设备可以是智能设备,也可以是传统家电设备。Fig. 1 shows a schematic diagram of the composition of a speech device according to an embodiment of the present application. As shown in FIG. 1, the voice device 100 may include a microphone (MIC) 110 and a communication module 120. MIC 110 is used to collect voice input. The communication module 120 is used to send the collected voice input to the processing device, so as to realize the semantic recognition of the voice input and the target device operation corresponding to the semantic recognition based on the processing device. As described below, the target device can be a smart device or a traditional home appliance.
语音装置可以结合处理装置构成一种语音套件(如下图5所示)。在此,“套件”指代协助作用以实现特定功能的一组装置。在本申请中,语音装置以其灵活易布置的特性用作例如在不同区域采集来自用户的语音信号的入口,处理装置则汇总语音装置采集的上述语音信息,经由本地和/或云端的语义处理功能,实现针对所述语音输入的语义识别以及识别语义所对应的目标设备操作。The voice device can be combined with the processing device to form a voice kit (as shown in Figure 5 below). Here, "kit" refers to a group of devices that assist in achieving specific functions. In this application, the voice device is used as an entry point for collecting voice signals from users in different areas due to its flexible and easy-to-arrange features, and the processing device summarizes the above-mentioned voice information collected by the voice device, and performs local and/or cloud semantic processing. The function realizes the semantic recognition of the voice input and the operation of the target device corresponding to the recognition semantics.
在涉及云端语义识别和命令下发的情况下,上述语音套件还可以与服务端相结合构成一种语音控制系统(如下图6所示)。服务端可以是与语音套件经由短距离通信的本地服务端,或者是可以与多个语音套件中的处理装置远程通信,以分别为其提供云端的语义识别、命令生成和下发功能的远程服务端(例如,服务器群)。In the case of cloud semantic recognition and command issuance, the above voice kit can also be combined with the server to form a voice control system (as shown in Figure 6 below). The server can be a local server that communicates with the voice kit via a short distance, or it can communicate remotely with processing devices in multiple voice kits to provide them with cloud semantic recognition, command generation, and remote services respectively. End (for example, server farm).
返回图1,MIC是一种将声音转换成电信号的换能器。如图所示,MIC 110可以收集用户发出的声音信号,将其转换为包含用户声音信息的电信号,并发送给通信模组120。通信模组120则可以将包含用户信息的电信号,作为用户的语音输入数据发送给处理装置,用于后续的语音输入的语义识别以及识别语义所对应的目标设备操作。在其他实施例中,MIC也可采集非用户发出的声音信号,例如,其他智能设备所发出的声音信号。Returning to Figure 1, MIC is a transducer that converts sound into electrical signals. As shown in the figure, the MIC 110 can collect the sound signal sent by the user, convert it into an electric signal containing the user's sound information, and send it to the communication module 120. The communication module 120 may send an electrical signal containing user information as the user's voice input data to the processing device for subsequent semantic recognition of voice input and target device operations corresponding to the semantic recognition. In other embodiments, the MIC may also collect sound signals emitted by non-users, for example, sound signals emitted by other smart devices.
由此,语音装置100可以通过仅保留最为简单的语义采集和通信功能,实现低功耗和小型化,以方便其本身的灵活安装。Therefore, the voice device 100 can achieve low power consumption and miniaturization by retaining only the simplest semantic collection and communication functions, so as to facilitate its flexible installation.
为了实现小型化和低功耗,通信模组120可以是低功耗的近距离通信模组。在此“近距离通信”指代通信距离通常在几百米范围之内的短距离无线通信。在一个实施例中,通信模组120可以是基于蓝牙技术与处理装置通信的蓝牙(Bluetooth,BT)通信模组,例如,基于蓝牙Mesh方案的通信模组。在另一个实施例中,通信模组120可以是基于红外技术与所述处理装置通信的红外(Infrared,IR)通信模组,例如,高速红外传输模组。再一个实施例中,通信模组120可以基于Zigbee技术与所述处理装置通信的Zigbee通信模组。在其他实施例中,通信模组120也可以使用BT和IR的组合。应该理解的是,本申请的通信模组120还可以由例如将来开发出的新的低功耗短距离通信技术实现,以 通过装置本身的小型化和低功耗,为其灵活布置创造条件。In order to achieve miniaturization and low power consumption, the communication module 120 may be a short-range communication module with low power consumption. Here, "near-field communication" refers to short-distance wireless communication where the communication distance is usually within a few hundred meters. In one embodiment, the communication module 120 may be a Bluetooth (BT) communication module that communicates with the processing device based on Bluetooth technology, for example, a communication module based on a Bluetooth Mesh solution. In another embodiment, the communication module 120 may be an infrared (Infrared, IR) communication module that communicates with the processing device based on infrared technology, for example, a high-speed infrared transmission module. In another embodiment, the communication module 120 may be a Zigbee communication module that communicates with the processing device based on Zigbee technology. In other embodiments, the communication module 120 may also use a combination of BT and IR. It should be understood that the communication module 120 of the present application can also be implemented by, for example, a new low-power short-distance communication technology developed in the future, so as to create conditions for its flexible layout through the miniaturization and low power consumption of the device itself.
在其他实施例中,语音装置100也可以包括功耗相对较低大且通常需要更强处理功能的WiFi通信模块,以便利用局域网与处理装置通信。当然,上述WiFi通信模块在某些实施例中也可用于进行短距离通信。In other embodiments, the voice device 100 may also include a WiFi communication module that has relatively low power consumption and generally requires stronger processing functions, so as to communicate with the processing device using a local area network. Of course, the aforementioned WiFi communication module may also be used for short-distance communication in some embodiments.
在语音装置100中,可以如上包括红外模组用于与处理装置进行短距离低功耗通信。在其他实施例中,还可以包括用于对目标设备施加操作的红外模组,或是将上述红外通信模组复用为命令施加模组。为此,红外模组可以用于对通信模组120从处理装置接收到的命令进行红外输出,以实现对与所述命令相对应的目标设备的操作。上述经由红外输出进行的操作尤其适用于针对传统目标设备,例如,采用红外遥控器控制的电视、空调等。用作红外遥控的红外模组可以基于操作所针对的具体设备,生成对应该设备接收频段的红外信号并加以发送。由此,该语音装置100还可用作万能红外遥控器。在其他实施例中,该语音装置还可以基于其他技术(例如,复用蓝牙通信装置)实现对目标设备的控制。In the voice device 100, an infrared module may be included as described above for short-distance low-power communication with the processing device. In other embodiments, it may further include an infrared module for applying operations to the target device, or multiplex the above-mentioned infrared communication module as a command application module. To this end, the infrared module may be used to perform infrared output on the command received by the communication module 120 from the processing device, so as to implement the operation of the target device corresponding to the command. The above-mentioned operations via infrared output are particularly suitable for traditional target devices, such as televisions and air conditioners controlled by infrared remote controllers. The infrared module used as an infrared remote control can generate and send an infrared signal corresponding to the receiving frequency band of the device based on the specific device targeted by the operation. Therefore, the voice device 100 can also be used as a universal infrared remote control. In other embodiments, the voice device can also implement control of the target device based on other technologies (for example, multiplexing Bluetooth communication devices).
在一个实施例中,语音装置100还包括供电模组,包括但不限于:无线充电组件;电池组件;USB插口。由于语音装置100的语音采集和传输功能所需的能耗极小,因此语音装置100的耗电量也相对较小,适于采用无需电源线的供电结构。由此大幅提升了语音装置100本身的便携性与灵活性。In one embodiment, the voice device 100 further includes a power supply module, including but not limited to: a wireless charging component; a battery component; and a USB socket. Since the power consumption of the voice collection and transmission function of the voice device 100 is extremely small, the power consumption of the voice device 100 is relatively small, and it is suitable for adopting a power supply structure without a power cord. This greatly improves the portability and flexibility of the voice device 100 itself.
由于本申请的语音装置100适于实现为无需直连电源线的形式,因此语音装置100还可以包括:用于附着于墙面、处理装置、目标设备、或其他设施表面的附着机构。图2示出了实现为智能语音贴的一个例子。如图2所示,本申请的语音装置100例如可以包括磁铁或是静电吸附表面作为附着机构200,以方便地吸附在其他设施的表面,例如图中的家用电器(例如,微波炉或烤箱)的侧壁。Since the voice device 100 of the present application is suitable to be implemented in a form that does not require a direct power cord, the voice device 100 may also include an attachment mechanism for attaching to a wall, a processing device, a target device, or other facility surfaces. Figure 2 shows an example implemented as a smart voice sticker. As shown in FIG. 2, the voice device 100 of the present application may include, for example, a magnet or an electrostatic adsorption surface as the attachment mechanism 200, so as to be easily adsorbed on the surface of other facilities, such as the appliance (for example, a microwave oven or oven) in the figure. Side wall.
为了进一步降低功耗,并避免误操作,本申请的语音装置100还优选地包括远程唤醒功能。在此,“远程唤醒”指可以通过特定的语音唤醒词把语音设备唤醒的方式。例如,市售的天猫精灵可以用唤醒词“天猫精灵”唤醒。具体地,语音装置100还可以包括唤醒模组,用于从来自用户的语音输入中识别出唤醒词。通信模组120则可以相应地在唤醒模组识别出唤醒词之后将采集到的语音输入发送给所述处理装置。例如,在唤醒词仅仅用来唤醒而不包括其他指令的情况下,通信模组120可以接收用户在说出唤醒词之后的语音输入并加以传输。而在唤醒词也包括指令的情况下,通信模组120可以接收用户的唤醒词本身及之后的语音输入并加以传输。由于唤醒模组可由限于的小型化低功 耗DSP(数字信号处理)电路实现,因此远场唤醒功能的添加不会对语音装置100的小型化和低功耗特性造成实质性的影响。In order to further reduce power consumption and avoid misoperation, the voice device 100 of the present application also preferably includes a remote wake-up function. Here, "remote wake-up" refers to the way that a voice device can be awakened by a specific voice wake-up word. For example, a commercially available Tmall wizard can be awakened with the wake word "Tmall wizard". Specifically, the voice device 100 may further include a wake-up module for recognizing the wake-up word from the voice input from the user. The communication module 120 can correspondingly send the collected voice input to the processing device after the wake-up module recognizes the wake-up word. For example, in the case where the wake-up word is only used to wake up and does not include other instructions, the communication module 120 may receive and transmit the voice input of the user after the wake-up word is spoken. In the case where the wake-up word also includes instructions, the communication module 120 can receive the user's wake-up word itself and subsequent voice input and transmit it. Since the wake-up module can be realized by a limited miniaturization and low-power DSP (digital signal processing) circuit, the addition of the far-field wake-up function will not substantially affect the miniaturization and low power consumption characteristics of the voice device 100.
在一个优选实施例中,本申请的语音装置100还可以包括扬声器,用于对所述通信模组从所述处理装置接收到的内容进行语音输出。由此,通过扬声器的引入,为与用户进行进一步的语音交互提供了可能。扬声器进行语音输出的内容包括如下至少一项:对执行命令的陈述内容;以及交互内容,例如,与用户的交互内容,以进一步获取缺失的语义要素。将在如下参考语音控制系统的说明中对语音交互进行详细描述。In a preferred embodiment, the voice device 100 of the present application may further include a speaker for voice output of the content received by the communication module from the processing device. Therefore, through the introduction of speakers, it is possible to conduct further voice interaction with users. The content of the voice output by the speaker includes at least one of the following: the statement content of the execution command; and the interactive content, for example, the interactive content with the user, to further obtain the missing semantic elements. The voice interaction will be described in detail in the following description with reference to the voice control system.
为了控制语音装置100的体积,该扬声器可以采用小型化集成组件实现。进一步地,该语音装置100还可以包括外部音箱连接模组,以使得连接的外部音箱对所述通信模组从所述处理装置接收到的内容进行语音输出,从而提供比自带扬声器更为优质的音频输出。在一个实施例中,外部音箱连接模组可以经由有线或无线连接来与外部音箱相连。外部音箱连接模组可以是蓝牙通信模组,换句话说,用于与处理装置通信的蓝牙通信模组还可以复用为外部音箱连接模组。作为替换或者补充,外部音箱连接模组可以是包括音频接口的有线连接模组,例如,用于与传统音箱相连的3.5mm插孔。In order to control the volume of the voice device 100, the speaker can be implemented with miniaturized integrated components. Further, the voice device 100 may also include an external speaker connection module, so that the connected external speaker performs voice output on the content received by the communication module from the processing device, thereby providing better quality than the built-in speaker. Audio output. In one embodiment, the external speaker connection module can be connected to the external speaker via a wired or wireless connection. The external speaker connection module may be a Bluetooth communication module. In other words, the Bluetooth communication module used to communicate with the processing device can also be reused as an external speaker connection module. As an alternative or supplement, the external speaker connection module may be a wired connection module including an audio interface, for example, a 3.5mm jack used to connect to a traditional speaker.
图3示出了本申请一个智能语音贴的组成例。如图3所示,实现为智能语音贴300的语音装置可以包括MIC 310、实现为蓝牙和/或红外近距离通信模组(BT/IR)320的通信模组、电池330和扬声器(SPK)340。该智能语音贴300例如还可以具有适于附加在任意合适附着表面的附着结构(如图3所示)。在其他实施例中,通信模组320也可以包括Zigbee通信模组。Figure 3 shows an example of the composition of an intelligent voice sticker in this application. As shown in Figure 3, the voice device implemented as a smart voice sticker 300 may include a MIC 310, a communication module implemented as a Bluetooth and/or infrared short-range communication module (BT/IR) 320, a battery 330, and a speaker (SPK) 340. The smart voice sticker 300 may also have an attachment structure suitable for attaching to any suitable attachment surface (as shown in FIG. 3), for example. In other embodiments, the communication module 320 may also include a Zigbee communication module.
具体地,MIC 310将接收到的用户语音转换成电信号,并将携带用户语音信息的上述电信号发送给BT/IR模组320。BT/IR模组320向处理装置发送上述用户的语音输入数据,以利用处理装置以及云端实现语义识别和相应操作命令生成。随后,BT/IR模组320还可以经由处理装置获取云端期望语音贴300输出给用户的内容数据,例如,与用户进行进一步交互或是报告操作结果的数据。BT/IR模组320可以将包括云端内容信息的上述电信号发送给SPK 340,并由后者进行电声转换语音报告给用户。Specifically, the MIC 310 converts the received user voice into an electric signal, and sends the above-mentioned electric signal carrying the user's voice information to the BT/IR module 320. The BT/IR module 320 sends the user's voice input data to the processing device, so as to use the processing device and the cloud to realize semantic recognition and corresponding operation command generation. Subsequently, the BT/IR module 320 may also obtain the content data that the cloud desired voice sticker 300 outputs to the user through the processing device, for example, data for further interaction with the user or reporting the operation result. The BT/IR module 320 may send the above-mentioned electric signal including the cloud content information to the SPK 340, and the latter performs an electro-acoustic conversion and voice report to the user.
在不同的实施例中,TTS(语音合成)可由不同的主体实施。例如,云端可以直接下发经TTS的数据,或是可由处理装置或是智能语音贴300包括上述TTS模组。在一个实施例中,出于传输效率以及智能语音贴低功耗和小型化的考虑,优选由处理装置基于云端下发的内容进行语音合成,再将含有上述语音合成的信号传输给语音贴300,BT/IR模组320以电信号的信号传输上述信息,以供SPK直接进行电声转换。In different embodiments, TTS (speech synthesis) can be implemented by different subjects. For example, the cloud can directly issue the data via the TTS, or the processing device or the smart voice sticker 300 can include the aforementioned TTS module. In one embodiment, for the consideration of transmission efficiency and low power consumption and miniaturization of the smart voice sticker, it is preferable that the processing device performs voice synthesis based on the content issued by the cloud, and then transmits the signal containing the above-mentioned voice synthesis to the voice sticker 300 , The BT/IR module 320 transmits the above-mentioned information as an electrical signal for the SPK to directly perform electro-acoustic conversion.
图4示出了本申请一个语音装置的电路处理流程例。如图4所示,麦克风阵列接收外部的语音输入(例如,用户的语音),并由带AEC(回声消除)功能的DSP(数字信号处理器)进行唤醒词识别。在识别出唤醒词后,经由带WiFi功能的系统级芯片或其上集成的红外或蓝牙(或BLE/mesh)进行采集语音的发送,以及后续语音输出数据(例如,命令执行陈述或交互问题)。接收到的语音数据随后经编解码器(codec)进行TTS处理,再通过扬声器输出。期间,电源为语音装置的各个功能模块供能。Figure 4 shows an example of the circuit processing flow of a voice device of the present application. As shown in FIG. 4, the microphone array receives external voice input (for example, the user's voice), and wake-up word recognition is performed by a DSP (digital signal processor) with AEC (Echo Cancellation) function. After the wake word is recognized, the collected voice is sent via the system-on-chip with WiFi function or its integrated infrared or Bluetooth (or BLE/mesh), and the subsequent voice output data (for example, command execution statement or interaction problem) . The received voice data is then processed by the codec (codec) for TTS processing, and then output through the speaker. During this period, the power supply supplies energy for the various functional modules of the voice device.
如前所述,本申请的语音装置可以与处理装置相结合,得到一种语音套件,用以实现本地操作所需的语音采集以及联网功能。图5示出了根据本申请一个实施例的语音套件的组成示意图。如图5所示,该语音套件500可以包括如上结合图1至图4描述的语音装置510以及处理装置520。As mentioned above, the voice device of the present application can be combined with the processing device to obtain a voice kit for implementing voice collection and networking functions required for local operations. Figure 5 shows a schematic diagram of the composition of a voice kit according to an embodiment of the present application. As shown in FIG. 5, the voice kit 500 may include the voice device 510 and the processing device 520 described above with reference to FIGS. 1 to 4.
该处理装置520包括与语音装置510通信连接的通信单元521,处理装置520经由其通信单元521接收所述语音装置采集到的语音数据,以实现针对所述语音输入的语义识别以及识别语义所对应的目标设备操作。通信单元521可以适于与小型语音装置510的通信模组511相对应的通信模式进行通信,例如低功耗短距离通信。在一个实施例中,上述通信单元521适于蓝牙、Zigbee和/或红外技术与对应的通信模组511进行通信。The processing device 520 includes a communication unit 521 communicatively connected to the voice device 510, and the processing device 520 receives the voice data collected by the voice device via its communication unit 521, so as to realize the semantic recognition of the voice input and the corresponding semantic recognition Operation of the target device. The communication unit 521 may be adapted to communicate in a communication mode corresponding to the communication module 511 of the small voice device 510, such as low-power short-distance communication. In an embodiment, the aforementioned communication unit 521 is adapted to communicate with the corresponding communication module 511 with Bluetooth, Zigbee and/or infrared technology.
具体地,上述语义识别和操作命令的生成和下发可以在云端实现。为此,处理装置520可以包括联网单元522,用于将从语音装置510接收到的来自用户的语音数据上传至云端的服务端。联网单元522例如是利用WiFi和/或诸如4G和5G的移动通信技术接入互联网的模块。服务端可以进行针对所述语音输入的语义识别,以生成并下发识别语义所对应的目标设备操作的操作命令。Specifically, the above-mentioned semantic recognition and the generation and issuance of operation commands can be implemented in the cloud. To this end, the processing device 520 may include a networking unit 522 for uploading the voice data from the user received from the voice device 510 to a server in the cloud. The networking unit 522 is, for example, a module that uses WiFi and/or mobile communication technologies such as 4G and 5G to access the Internet. The server may perform semantic recognition for the voice input to generate and issue operation commands for the operation of the target device corresponding to the recognition semantics.
在一些实施例中,上述语义识别和操作命令的生成和下发可以在本地实现。于是,服务端可以包括:与所述语音套件近距离通信的本地服务端,所述本地服务端用于针对至少部分所述语音输入进行语义识别,生成识别语义所对应的目标设备操作的操作命令并下发所述操作命令。例如,本地服务端可以是用作家庭智能处理终端的智能音箱。由此,通过本地服务端提升语音命令的处理速度。在一个实施例中,上述本地服务端可以与云端服务器相连,整体组成本申请中的“服务端”。In some embodiments, the generation and issuance of the aforementioned semantic recognition and operation commands may be implemented locally. Therefore, the server may include: a local server that communicates with the voice kit in close range, the local server is used to perform semantic recognition for at least part of the voice input, and generate operation commands for recognizing the operation of the target device corresponding to the semantics And issue the operation command. For example, the local server can be a smart speaker used as a home smart processing terminal. As a result, the processing speed of voice commands is improved through the local server. In one embodiment, the above-mentioned local server can be connected to a cloud server to form the "server" in this application as a whole.
基于不同的应用场景,套件可以包括布置在不同区域内的多个语音装置,并且每个处理装置与通信范围内的一个或多个所述小型化设备通信连接。例如,在家庭场景中,可以在不同的房间(例如,客厅、卧室、洗手间和厨房)中布置不同的语音贴,上述多个语音贴510可以在蓝牙通信范围内与一个处理装置520相连接,并由该套件中唯一的 处理装置520实现与云端的联网。在其他实施例中,例如在公司场景中,则套件中可以包括多个处理装置,以负责数量更多的语音贴。在其他实施例中,处理装置520也可以作为目标设备(例如,智能家电)的外部模组,以执行针对该目标设备本身的联网操作。Based on different application scenarios, the kit may include multiple voice devices arranged in different areas, and each processing device is communicatively connected with one or more of the miniaturized devices within the communication range. For example, in a home scene, different voice stickers may be arranged in different rooms (for example, living room, bedroom, bathroom, and kitchen). The multiple voice stickers 510 may be connected to one processing device 520 within the Bluetooth communication range. And the only processing device 520 in the kit realizes networking with the cloud. In other embodiments, such as in a company scenario, multiple processing devices may be included in the kit to be responsible for a larger number of voice stickers. In other embodiments, the processing device 520 may also be used as an external module of the target device (for example, a smart home appliance) to perform networking operations for the target device itself.
根据不同的控制场景,可以基于不同的途径实现用户语音输入对目标设备的控制。例如,在不同的实施例中,目标设备可以直接从服务端接收下发的操作命令并执行与操作命令相对应的操作;和/或处理装置520经由其联网单元522接收服务端下发的操作命令,自行或经由语音装置510向目标设备下发操作命令。直接从服务端接收下发的操作命令并执行的目标设备可以包括自身联网的智能家电设备。经由所述处理装置520或语音装置510获取操作命令的目标设备则可包括传统家电设备。According to different control scenarios, the user's voice input can control the target device based on different approaches. For example, in different embodiments, the target device may directly receive the issued operation command from the server and execute the operation corresponding to the operation command; and/or the processing device 520 receives the operation issued by the server via its networking unit 522 Commands are issued to the target device by itself or via the voice device 510. The target device that directly receives the issued operation command from the server and executes it may include its own networked smart home appliance. The target device that obtains the operation command via the processing device 520 or the voice device 510 may include traditional home appliances.
例如,家中的运行的智能家电都已与控制服务端相连接。此时,针对智能家电的操作命令(例如,调低冰箱冷藏室温度)可由控制服务端直接下发。而对于需要使用对应红外编码控制的传统家电,服务端可以基于语义识别生成相应的操作命令(例如,关闭空调),查找空调的红外操作编码,并发上述命令直接下发给上述用作万能红外遥控器的语音装置。在其他实施例中,上述红外操作的编码也可以在本地实施,例如在处理装置或语音装置处实施。For example, all the smart home appliances running in the home have been connected to the control server. At this time, the operation command for the smart home appliance (for example, lowering the temperature of the refrigerator compartment) can be directly issued by the control server. For traditional home appliances that need to be controlled by corresponding infrared codes, the server can generate corresponding operation commands based on semantic recognition (for example, turn off the air conditioner), find the infrared operation code of the air conditioner, and send the above commands directly to the above-mentioned universal infrared remote control Voice device of the device. In other embodiments, the encoding of the above infrared operation may also be implemented locally, for example, implemented at a processing device or a voice device.
在语音装置510包括扬声器以进行语音输出的情况下,处理装置520还可以使用其联网单元522接收服务端下发的与用户的交互内容,并且通信单元521还用于将与用户的交互内容发送给所述语音装置,以供语音输出。上述交互的内容可以是对设备操作的确认(例如,“灯已打开”)、对必须语义要素的获取(例如,在识别出用户的“开灯”语音输入且范围内有不止一盏灯的情况下,进一步询问“开哪一盏灯”)、或是上述两者的结合(例如,“电视已打开,需要观看哪个频道)等。In the case that the voice device 510 includes a speaker for voice output, the processing device 520 can also use its networking unit 522 to receive the interactive content with the user issued by the server, and the communication unit 521 is also used to send the interactive content with the user. Give the voice device for voice output. The content of the above interaction can be the confirmation of the operation of the device (for example, "the light is turned on"), the acquisition of the necessary semantic elements (for example, the recognition of the user's "light on" voice input and there is more than one light in the range In this case, further ask "Which light to turn on"), or a combination of the above two (for example, "The TV is turned on, which channel needs to be watched), etc.
在一个实施例中,处理装置520本身可以具备简单的语音识别和命令生成和下发功能。为此,处理装置520可以包括:语音识别单元,用于针对所述语音输入进行语义识别;以及操作命令生成单元,用于生成识别语义所对应的目标设备操作的操作命令。由此,使得本申请的套件不仅能够通过连接云端的服务端实现对复杂语义的理解,也能够针对简单输入迅速做出反映。In an embodiment, the processing device 520 itself may have simple voice recognition and command generation and issuance functions. To this end, the processing device 520 may include: a voice recognition unit for performing semantic recognition for the voice input; and an operation command generating unit for generating operation commands for recognizing the operation of the target device corresponding to the semantics. As a result, the suite of the present application can not only realize the understanding of complex semantics by connecting to the server of the cloud, but also quickly respond to simple input.
在一个实施例中,处理装置520本身可以是与云端服务端相连的智能音箱,或是其他本身也具有语音采集功能的装置。在此,语音装置510可以用作帮助该智能音箱在不同区域(例如,不方便直接对布置在客厅的智能音箱进行语音操作的厨房或是浴室区域)采集语音的入口。In one embodiment, the processing device 520 itself may be a smart speaker connected to a cloud server, or another device that also has a voice collection function. Here, the voice device 510 can be used as an entrance to help the smart speaker to collect voices in different areas (for example, kitchen or bathroom areas where it is not convenient to directly perform voice operations on the smart speakers arranged in the living room).
进一步地,上述套件可以与服务端相结合,以实现一种语音控制系统。图6示出了根据本申请一个实施例的语音控制系统的组成示意图。如图6所示,系统600可以包括多个如上所述的语音套件610以及服务端620。在此,服务端620可以指代提供特定功能的服务端群。Further, the above kit can be combined with the server to realize a voice control system. Fig. 6 shows a schematic diagram of the composition of a voice control system according to an embodiment of the present application. As shown in FIG. 6, the system 600 may include a plurality of voice packages 610 and server 620 as described above. Here, the server 620 may refer to a group of servers that provide specific functions.
每个语音套件610都可以包括至少一个语音装置和至少一个处理装置,例如,图示的三个语音装置和一个处理装置,并且语音装置和处理装置之间经由短距离低功耗通信手段(例如,图示的BT)进行通信。每个语音装置例如可以用于在不同区域内进行语音的收集和传送。Each voice kit 610 may include at least one voice device and at least one processing device, for example, the illustrated three voice devices and one processing device, and the voice device and the processing device are connected via short-distance low-power communication means (for example, , The illustrated BT) to communicate. Each voice device can be used for voice collection and transmission in different areas, for example.
每个语音套件610通过处理装置的联网功能(例如,WiFi模块)与服务端620相连。服务端620可以对处理装置上传的语音输入进行语义识别,以生成并下发识别语义所对应的目标设备操作的操作命令。Each voice kit 610 is connected to the server 620 through the networking function of the processing device (for example, a WiFi module). The server 620 may perform semantic recognition on the voice input uploaded by the processing device to generate and issue operation commands for recognizing the operation of the target device corresponding to the semantics.
在一个实施例中,可由该服务端本身可以实现语义识别、操作命令生成和下发等的全部操作。于是,服务端620可以包括:用于对上传的语音输入进行语义识别的语义处理服务端;基于所述识别的语义生成针对目标识别操作的操作命令的命令生成服务端;以及下发操作命令的命令下发服务端。In one embodiment, the server itself can implement all operations such as semantic recognition, operation command generation and issuance, etc. Thus, the server 620 may include: a semantic processing server for semantically recognizing the uploaded voice input; a command generating server for generating operation commands for target recognition operations based on the recognized semantics; and a server for issuing operation commands The command is issued to the server.
在另一个实施例中,服务端620可以仅用于语义识别,或是针对部分目标设备的操作命令生成和下发。至少本申请目标设备的控制,可由外部服务端实现。这尤其适用于某一品牌的服务商对自己的智能设备提供远程操控功能的情况。由此,服务端620可以包括:用于对上传的语音输入进行语义识别的语义处理服务端,并且所述服务端将所述识别出的语义发送给外部服务端,其中所述外部服务端基于所述识别的语义生成针对目标识别操作的操作命令的命令生成服务端,以及下发所述操作命令的命令下发服务端。In another embodiment, the server 620 may be used only for semantic recognition, or for generating and issuing operation commands for some target devices. At least the control of the target device in this application can be implemented by an external server. This is especially applicable when a service provider of a certain brand provides remote control functions for their smart devices. Thus, the server 620 may include a semantic processing server for semantic recognition of the uploaded voice input, and the server sends the recognized semantics to an external server, wherein the external server is based on The recognized semantics generates a command generation server for the operation command for the target recognition operation, and the command issuance server for issuing the operation command.
在一个实施例中,服务端620可以预先获取如下至少一项本地设备配置信息。本地设备配置信息可以包括:语音装置、处理装置和/或至少部分目标设备自身的分布和设备信息;以及语音装置、处理装置和目标设备中至少两者之间的对应关系。由此,服务端620还可以基于本地设备配置信息自动补齐识别语义中执行针对目标设备的操作所缺失的语义要素。例如,基于语音贴所处位置浴室,自行明确“开灯”是开浴室中唯一的顶灯。In an embodiment, the server 620 may obtain in advance at least one piece of local device configuration information as follows. The local device configuration information may include: the distribution and device information of the voice device, the processing device and/or at least part of the target device itself; and the correspondence between at least two of the voice device, the processing device and the target device. In this way, the server 620 can also automatically complete the semantic elements that are missing in the recognition semantics for performing operations on the target device based on the local device configuration information. For example, based on the location of the bathroom where the voice sticker is located, it is clear that "turning on the light" is the only ceiling light in the bathroom.
另外,本申请还可以实现为一种语音控制方法。图7示出了根据本申请一个实施例的语音控制方法的示意性流程图。该方法可由如上的语音装置、套件和系统实施。In addition, this application can also be implemented as a voice control method. Fig. 7 shows a schematic flowchart of a voice control method according to an embodiment of the present application. The method can be implemented by the above voice device, kit and system.
在步骤S710,语音装置(例如,小型化语音贴)采集语音输入。在步骤S720,语音 装置将语音输入发送给处理装置。在一个实施例中,上述发送可以是例如基于红外、蓝牙和/或Zigbee的短距离通信。In step S710, the voice device (for example, a miniaturized voice sticker) collects voice input. In step S720, the voice device sends the voice input to the processing device. In an embodiment, the above-mentioned transmission may be, for example, short-distance communication based on infrared, Bluetooth and/or Zigbee.
在步骤S730,所述处理装置实现针对所述语音输入的语义识别以及对应目标设备操作命令生成。在不同的实施例中,上述用于语义识别和命令生成的操作可以由不同的对象完成,例如,可由处理装置自身、本地服务端、远程服务端或其任意结合完成。In step S730, the processing device implements semantic recognition for the voice input and generation of corresponding target device operation commands. In different embodiments, the foregoing operations for semantic recognition and command generation can be completed by different objects, for example, can be completed by the processing device itself, a local server, a remote server, or any combination thereof.
于是,在一个实施例中,在步骤S730,处理装置可以将所述语音输入发送至服务端,并使用服务端对语音输入的语义识别,以获取识别语义所对应的目标设备操作的操作命令。Therefore, in one embodiment, in step S730, the processing device may send the voice input to the server, and use the semantic recognition of the voice input by the server to obtain the operation command of the target device operation corresponding to the semantic recognition.
在又一个实施例中,在步骤S730,服务端可以包括本地服务端,本地服务端可以针对至少部分所述语音输入进行语义识别,生成识别语义所对应的目标设备操作的操作命令并下发所述操作命令。In another embodiment, in step S730, the server may include a local server, and the local server may perform semantic recognition for at least part of the voice input, generate operation commands for identifying the operation of the target device corresponding to the semantics, and issue all The operation command.
在另一个实施例中,在步骤S730,所述处理装置针对至少部分所述语音输入进行语义识别,并生成识别语义所对应的目标设备操作的操作命令。In another embodiment, in step S730, the processing device performs semantic recognition for at least part of the voice input, and generates an operation command for recognizing the operation of the target device corresponding to the semantics.
在一个实施例中,本申请的语音控制方法还可以包括:使用所述语音装置和/或所述处理装置获取识别语义所对应的目标设备操作的操作命令;以及所述语音装置和/或所述处理装置向所述目标设备下发所述操作命令。In an embodiment, the voice control method of the present application may further include: using the voice device and/or the processing device to obtain an operation command for identifying the operation of the target device corresponding to the semantics; and the voice device and/or the processing device The processing device issues the operation command to the target device.
在一个实施例中,本申请的语音控制方法还可以包括:所述处理装置获取下发语音输出内容;所述处理装置向所述语音装置下发所述语音输出内容;以及所述语音装置语音输出所述语音输出内容。In one embodiment, the voice control method of the present application may further include: the processing device obtains and delivers the voice output content; the processing device delivers the voice output content to the voice device; and the voice device voice Output the voice output content.
在一个实施例中,本申请的语音控制方法还可以包括:语音装置从来自用户的语音输入中识别出唤醒词。于是步骤S720可以包括:在识别出唤醒词之后将采集到的语音输入发送给所述处理装置。In an embodiment, the voice control method of the present application may further include: the voice device recognizes the wake-up word from the voice input from the user. Therefore, step S720 may include: sending the collected voice input to the processing device after the wake word is recognized.
在一个实施例中,本申请的语音控制方法还可以包括:服务端和/或获取了所述语义识别的外部服务端基于所述语义识别生成并下发识别语义所对应的目标设备操作的操作命令。In an embodiment, the voice control method of the present application may further include: a server and/or an external server that has acquired the semantic recognition generates and issues operations of the target device operation corresponding to the semantic recognition based on the semantic recognition command.
在一个实施例中,本申请的语音控制方法还可以包括:目标设备接收所述服务端和/或所述外部服务端下发的操作命令并执行与所述操作命令相对应的操作。相应地,目标设备接收所述服务端和/或所述外部服务端下发的操作命令并执行与所述操作命令相对应的操作可以包括如下的至少一项:所述目标设备从所述服务端或所述外部服务端直接接收下发的操作命令;所述处理装置经由所述联网单元接收所述服务端下发的操作命令, 自行或经由所述语音装置向所述目标设备下发所述操作命令。In an embodiment, the voice control method of the present application may further include: the target device receives an operation command issued by the server and/or the external server and executes an operation corresponding to the operation command. Correspondingly, the target device receiving the operation command issued by the server and/or the external server and performing the operation corresponding to the operation command may include at least one of the following: the target device receives the operation command from the service Terminal or the external server directly receives the issued operation command; the processing device receives the operation command issued by the server via the networking unit, and issues the issued operation command to the target device by itself or via the voice device The operation command.
在一个实施例中,本申请的语音控制方法还可以包括:所述服务端获取所述操作命令的执行结果;所述服务端基于所述执行结果生成执行命令的陈述内容并将其下发至所述处理装置;所述处理装置向所述语音装置下发所述执行命令的陈述内容;以及所述语音装置语音输出所述执行命令的陈述内容。In one embodiment, the voice control method of the present application may further include: the server obtains the execution result of the operation command; the server generates the statement content of the execution command based on the execution result and sends it to The processing device; the processing device issues the statement content of the execution command to the voice device; and the voice device outputs the statement content of the execution command by voice.
在一个实施例中,本申请的语音控制方法还可以包括:所述服务端基于所述语义识别生成与用户交互的内容并将其下发至所述处理装置;所述处理装置向所述语音装置下发所述与用户交互的内容;以及所述语音装置语音输出所述与用户交互的内容。In one embodiment, the voice control method of the present application may further include: the server generates content interacting with the user based on the semantic recognition and sends it to the processing device; and the processing device sends the content to the voice The device delivers the content of interaction with the user; and the voice device vocally outputs the content of interaction with the user.
在一个实施例中,本申请的语音控制方法还可以包括:所述服务端基于预先获取的本地设备配置信息自动补齐识别语义中执行针对目标设备的操作所缺失的语义要素。In an embodiment, the voice control method of the present application may further include: the server automatically fills in the semantic elements that are missing in the operation on the target device in the recognition semantics based on the local device configuration information obtained in advance.
在一个实施例中,本申请的语音控制方法中涉及的服务端可以是本地服务端,于是所述本地服务端可以针对至少部分所述语音输入进行语义识别,生成识别语义所对应的目标设备操作的操作命令并下发所述操作命令。In one embodiment, the server involved in the voice control method of the present application may be a local server, so the local server may perform semantic recognition for at least part of the voice input to generate target device operations corresponding to the recognition semantics And issue the operation command.
图8示出了本申请一个语音控制的例子。如图8所示,在本地采集和上传阶段,语音套件中的语音装置(以及,在某些实施例中,处理装置中的语音采集模块)检测来自用户或智能设备的语音命令输入,处理装置可以对采集的语音进行初步的处理,例如,进行ASR(语音识别)拾取,并将拾取的语音命令传输至云端。在云端处理阶段,服务端可以对拾取的语音命令进行后续处理,例如NIP(自然语音处理)和NIU(自然语音理解),并根据处理结果,进行命令解析和TTS输出。在本地处理阶段,解析出的命令可以直接传输至目标设备出以供执行(例如,智能设备直接执行云端解析出的命令),也可以例如在经由处理装置或语音装置的转换后由目标设备执行命令(例如,由语音装置针对传统家电设备发出的红外指令)。另外,在存在音频输出时,语音装置可以通过自带speaker(扬声器)或是外连的蓝牙或传统音箱进行语音输出。Figure 8 shows an example of voice control in this application. As shown in Figure 8, in the local collection and upload stage, the voice device in the voice kit (and, in some embodiments, the voice collection module in the processing device) detects voice command input from the user or smart device, and the processing device Preliminary processing can be performed on the collected voice, for example, ASR (speech recognition) pickup, and the picked up voice command is transmitted to the cloud. In the cloud processing stage, the server can perform subsequent processing on the picked-up voice commands, such as NIP (Natural Voice Processing) and NIU (Natural Voice Understanding), and perform command analysis and TTS output according to the processing results. In the local processing stage, the parsed command can be directly transmitted to the target device for execution (for example, the smart device directly executes the command parsed by the cloud), or it can be executed by the target device after conversion by a processing device or a voice device, for example Commands (for example, infrared commands issued by a voice device for traditional household appliances). In addition, when there is audio output, the voice device can perform voice output through its own speaker or externally connected Bluetooth or traditional speakers.
上文中已经参考附图详细描述了根据本申请的语音装置、语音套件和语音控制系统。在设备内部并入语音模组的现有技术开发周期长,成本高,需要每一个设备都要进行声学设计和调试。相比之下,本申请的模组化方案可以方便的集成至任何需要的设备。另外,本申请的套件使用方便,通过语音模组的小型化、便携化设计,使之可以方便地安放在各类表面上使用,适合多场景使用。进一步地,本申请的智能语音贴还可以通过附加红外模组实现万能红外遥控器功能,从而实现对传统红外设备的智能控制。The voice device, voice kit, and voice control system according to the present application have been described in detail above with reference to the accompanying drawings. The existing technology that incorporates the voice module into the device has a long development cycle and high cost, and requires acoustic design and debugging for each device. In contrast, the modular solution of the present application can be easily integrated into any required equipment. In addition, the kit of the present application is convenient to use, and the miniaturized and portable design of the voice module makes it easy to be placed on various surfaces and suitable for use in multiple scenarios. Furthermore, the smart voice sticker of the present application can also realize the function of a universal infrared remote control by adding an infrared module, thereby realizing intelligent control of traditional infrared devices.
此外,根据本申请的方法还可以实现为一种计算机程序或计算机程序产品,该计算 机程序或计算机程序产品包括用于执行本申请的上述方法中限定的上述各步骤的计算机程序代码指令。In addition, the method according to the present application can also be implemented as a computer program or computer program product. The computer program or computer program product includes computer program code instructions for executing the above-mentioned steps defined in the above-mentioned method of the present application.
或者,本申请还可以实施为一种非暂时性机器可读存储介质(或计算机可读存储介质、或机器可读存储介质),其上存储有可执行代码(或计算机程序、或计算机指令代码),当所述可执行代码(或计算机程序、或计算机指令代码)被电子设备(或计算设备、服务端等)的处理器执行时,使所述处理器执行根据本申请的上述方法的各个步骤。Alternatively, this application can also be implemented as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium), on which executable code (or computer program, or computer instruction code) is stored ), when the executable code (or computer program, or computer instruction code) is executed by the processor of the electronic device (or computing device, server, etc.), the processor is caused to execute each of the above methods according to the application step.
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。Those skilled in the art will also understand that the various exemplary logic blocks, modules, circuits, and algorithm steps described in conjunction with the disclosure herein can be implemented as electronic hardware, computer software, or a combination of both.
附图中的流程图和框图显示了根据本申请的多个实施例的系统和方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标记的功能也可以以不同于附图中所标记的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings show the possible implementation architecture, functions, and operations of the system and method according to multiple embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logical function. Executable instructions. It should also be noted that in some alternative implementations, the functions marked in the block may also occur in a different order than marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。The embodiments of the present application have been described above, and the above description is exemplary and not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or improvements to technologies in the market of the embodiments, or to enable other ordinary skilled in the art to understand the embodiments disclosed herein.

Claims (43)

  1. 一种语音控制系统,包括多个语音套件以及与所述语音套件通信的服务端,其中,所述语音套件包括:A voice control system includes a plurality of voice kits and a server communicating with the voice kit, wherein the voice kit includes:
    语音装置,用于采集语音输入并将采集到的语音输入发送给处理装置;Voice device for collecting voice input and sending the collected voice input to the processing device;
    处理装置,用于接收所述语音装置采集到的语音输入,并将所述语音输入上传至服务端,并且The processing device is used to receive the voice input collected by the voice device, and upload the voice input to the server, and
    所述服务端,用于对所述处理装置发送的所述语音输入进行语义识别,以生成并下发识别语义所对应的目标设备操作的操作命令。The server end is used to perform semantic recognition on the voice input sent by the processing device, so as to generate and issue operation commands for the operation of the target device corresponding to the recognition semantics.
  2. 如权利要求1所述的系统,其中,所述语音装置将采集到的语音输入经由近距离通信发送给所述处理装置。8. The system of claim 1, wherein the voice device sends the collected voice input to the processing device via short-range communication.
  3. 如权利要求2所述的系统,其中,所述语音套件包括布置在不同区域内的多个所述语音装置,并且每个处理装置与其近距离通信范围内的一个或多个所述语音装置通信连接。The system of claim 2, wherein the voice kit includes a plurality of the voice devices arranged in different areas, and each processing device communicates with one or more of the voice devices within the short-range communication range connection.
  4. 如权利要求2所述的系统,其中,所述语音装置和所述处理装置各自包括彼此进行近距离通信的低功耗短距离通信模组,所述通信模组包括如下至少一种:3. The system of claim 2, wherein the voice device and the processing device each comprise a low-power short-range communication module for performing short-range communication with each other, and the communication module includes at least one of the following:
    基于蓝牙技术与所述处理装置通信的蓝牙通信模组;A Bluetooth communication module that communicates with the processing device based on Bluetooth technology;
    基于红外技术与所述处理装置通信的红外通信模组;An infrared communication module based on infrared technology to communicate with the processing device;
    基于Zigbee技术与所述处理装置通信的Zigbee通信模组。A Zigbee communication module based on Zigbee technology to communicate with the processing device.
  5. 如权利要求1所述的系统,其中,所述处理装置经由远程通信将所述语音输入上传至所述服务端。3. The system of claim 1, wherein the processing device uploads the voice input to the server via remote communication.
  6. 如权利要求5所述的系统,其中,所述处理装置还包括与所述服务端进行远程通信的WiFi模组。7. The system of claim 5, wherein the processing device further comprises a WiFi module for remote communication with the server.
  7. 如权利要求1所述的系统,其中,所述语音装置从采集到的语音输入中识别出唤醒词,在所述唤醒模组识别出唤醒词之后将采集到的语音输入发送给所述处理装置。The system of claim 1, wherein the voice device recognizes a wake-up word from the collected voice input, and sends the collected voice input to the processing device after the wake-up module recognizes the wake-up word .
  8. 如权利要求1所述的系统,其中,所述语音装置经由自身扬声器或有线/无线连接的外接音箱,对从所述处理装置和/或目标设备接收到的内容进行语音输出。The system according to claim 1, wherein the voice device outputs the content received from the processing device and/or the target device via its own speaker or a wired/wireless external speaker.
  9. 如权利要求8所述的系统,其中,进行语音输出的内容包括如下至少一项:The system according to claim 8, wherein the content for voice output includes at least one of the following:
    对执行命令的陈述内容;以及The statement of the execution order; and
    与用户的交互内容。Interaction content with the user.
  10. 如权利要求1所述的系统,其中,所述语音装置还用于对所述处理装置接收到 的命令进行红外输出,以实现对与所述命令相对应的目标设备的操作。The system according to claim 1, wherein the voice device is also used to perform infrared output on the command received by the processing device, so as to realize the operation of the target device corresponding to the command.
  11. 如权利要求1所述的系统,所述语音装置还包括用于附着于墙面、处理装置、目标设备、或其他设施表面的附着机构。The system of claim 1, wherein the voice device further comprises an attachment mechanism for attaching to a wall surface, a processing device, a target device, or a surface of other facilities.
  12. 如权利要求1所述的系统,其中,所述处理装置用于针对至少部分所述语音输入进行语义识别,并生成识别语义所对应的目标设备操作的操作命令。The system according to claim 1, wherein the processing device is configured to perform semantic recognition for at least part of the voice input, and generate operation commands for identifying the operation of the target device corresponding to the semantics.
  13. 如权利要求1所述的系统,其中,The system of claim 1, wherein:
    所述目标设备直接从所述服务端接收下发的操作命令并执行与所述操作命令相对应的操作;和/或The target device directly receives the issued operation command from the server and executes the operation corresponding to the operation command; and/or
    所述处理装置接收所述服务端下发的操作命令,自行或经由所述语音装置向所述目标设备下发所述操作命令。The processing device receives the operation command issued by the server, and issues the operation command to the target device by itself or via the voice device.
  14. 如权利要求13所述的系统,其中,The system of claim 13, wherein:
    直接从所述服务端接收下发的操作命令并执行的所述目标设备包括联网的智能家电设备;并且The target device that directly receives and executes the issued operation command from the server includes a networked smart home appliance; and
    经由所述处理装置或所述语音装置获取操作命令的所述目标设备包括传统家电设备。The target device that obtains the operation command via the processing device or the voice device includes a traditional home appliance.
  15. 如权利要求1所述的系统,其中,所述服务端包括:The system of claim 1, wherein the server includes:
    用于对上传的语音输入进行语义识别的语义处理服务端;Semantic processing server for semantic recognition of uploaded voice input;
    基于所述识别的语义生成针对目标识别操作的操作命令的命令生成服务端;以及Based on the recognized semantics, a command generation server for generating operation commands for target recognition operations; and
    下发所述操作命令的命令下发服务端。The command for issuing the operation command is issued to the server.
  16. 如权利要求1所述的系统,其中,所述服务端包括:The system of claim 1, wherein the server includes:
    与所述语音套件近距离通信的本地服务端,所述本地服务端用于针对至少部分所述语音输入进行语义识别,生成识别语义所对应的目标设备操作的操作命令并下发所述操作命令。A local server for short-range communication with the voice kit, the local server is used to perform semantic recognition for at least part of the voice input, generate operation commands for recognizing the operation of the target device corresponding to the semantics, and issue the operation commands .
  17. 如权利要求1所述的系统,其中,所述服务端包括:The system of claim 1, wherein the server includes:
    用于对上传的语音输入进行语义识别的语义处理服务端,并且Semantic processing server for semantic recognition of uploaded voice input, and
    所述服务端将所述识别出的语义发送给外部服务端,其中The server sends the identified semantics to the external server, where
    所述外部服务端基于所述识别的语义生成针对目标识别操作的操作命令的命令生成服务端,以及下发所述操作命令的命令下发服务端。The external server generates a command generation server for an operation command for the target recognition operation based on the recognized semantics, and a command issuance server for issuing the operation command.
  18. 如权利要求1所述的系统,其中,所述服务端预先获取如下至少一项本地设备配置信息:The system according to claim 1, wherein the server obtains in advance at least one of the following local device configuration information:
    所述语音装置、所述处理装置和/或至少部分目标设备自身的分布和设备信息;The distribution and device information of the speech device, the processing device and/or at least part of the target device itself;
    所述语音装置、所述处理装置和目标设备中至少两者之间的对应关系。Correspondence between at least two of the speech device, the processing device and the target device.
  19. 如权利要求18所述的系统,其中,所述服务端基于所述本地设备配置信息自动补齐识别语义中执行针对目标设备的操作所缺失的语义要素。The system of claim 18, wherein the server automatically fills in the semantic elements that are missing in the recognition semantics for performing operations on the target device based on the local device configuration information.
  20. 一种语音套件,包括:A voice kit including:
    语音装置,用于采集语音输入并将采集到的语音输入发送给处理装置;Voice device for collecting voice input and sending the collected voice input to the processing device;
    处理装置,包括与所述语音装置通信连接的通信单元,所述处理装置经由所述通信单元接收所述语音装置采集到的语音数据,以实现针对所述语音输入的语义识别以及识别语义所对应的目标设备操作。The processing device includes a communication unit communicatively connected with the voice device, and the processing device receives the voice data collected by the voice device via the communication unit, so as to realize the semantic recognition of the voice input and the corresponding semantic recognition Operation of the target device.
  21. 如权利要求20所述的套件,其中,所述语音装置和所述处理装置各自包括彼此进行近距离通信的低功耗短距离通信模组,所述通信模组包括如下至少一种:22. The kit of claim 20, wherein the voice device and the processing device each include a low-power short-range communication module for performing short-range communication with each other, and the communication module includes at least one of the following:
    基于蓝牙技术与所述处理装置通信的蓝牙通信模组;A Bluetooth communication module that communicates with the processing device based on Bluetooth technology;
    基于红外技术与所述处理装置通信的红外通信模组;以及An infrared communication module based on infrared technology to communicate with the processing device; and
    基于Zigbee技术与所述处理装置通信的Zigbee通信模组。A Zigbee communication module based on Zigbee technology to communicate with the processing device.
  22. 如权利要求20所述的套件,其中,所述处理装置还包括:The kit of claim 20, wherein the processing device further comprises:
    联网单元,用于将从所述语音装置接收到的来自用户的语音数据上传至服务端,其中,所述服务端进行针对所述语音输入的语义识别,以生成并下发识别语义所对应的目标设备操作的操作命令。The networking unit is configured to upload the voice data from the user received from the voice device to the server, where the server performs semantic recognition for the voice input to generate and issue the corresponding semantic recognition Operation command for target device operation.
  23. 如权利要求20所述的套件,其中,所述处理装置包括:The kit of claim 20, wherein the processing device comprises:
    语音识别单元,用于针对所述语音输入进行语义识别;以及A voice recognition unit for performing semantic recognition for the voice input; and
    操作命令生成单元,用于生成识别语义所对应的目标设备操作的操作命令。The operation command generation unit is used to generate operation commands for identifying the operation of the target device corresponding to the semantics.
  24. 如权利要求20所述的套件,其中,所述处理装置经由联网单元接收服务端下发的操作命令,自行或经由所述语音装置向所述目标设备下发所述操作命令。The kit according to claim 20, wherein the processing device receives the operation command issued by the server via the networking unit, and issues the operation command to the target device by itself or via the voice device.
  25. 如权利要求24所述的套件,其中,所述联网单元还用于:The kit according to claim 24, wherein the networking unit is further used for:
    接收所述服务端下发的交互内容,以及Receiving the interactive content issued by the server, and
    所述通信单元还用于:The communication unit is also used for:
    将所述交互内容发送给所述语音装置,以供语音输出。The interactive content is sent to the voice device for voice output.
  26. 如权利要求20所述的套件,其中,所述语音套件包括布置在不同区域内的多个所述语音装置,并且每个处理装置与通信范围内的一个或多个所述语音装置通信连接。The kit according to claim 20, wherein the voice kit includes a plurality of the voice devices arranged in different areas, and each processing device is communicatively connected with one or more of the voice devices within a communication range.
  27. 一种语音装置,包括:A voice device, including:
    麦克风,用于采集语音输入;Microphone, used to collect voice input;
    通信模组,用于将采集到的语音输入发送给处理装置,以通过所述处理装置实现针对所述语音输入的语义识别以及识别语义所对应的目标设备操作。The communication module is used to send the collected voice input to the processing device, so as to realize the semantic recognition of the voice input and the operation of the target device corresponding to the semantic recognition through the processing device.
  28. 如权利要求27所述的语音装置,还包括:The speech device according to claim 27, further comprising:
    唤醒模组,用于从语音输入中识别出唤醒词,并且Wake-up module, used to recognize wake-up words from voice input, and
    所述通信模组用于在所述唤醒模组识别出唤醒词之后将采集到的语音输入发送给所述处理装置。The communication module is used for sending the collected voice input to the processing device after the wake-up module recognizes the wake-up word.
  29. 如权利要求27所述的语音装置,还包括:The speech device according to claim 27, further comprising:
    扬声器,用于对所述通信模组从所述处理装置和/或所述目标设备接收到的内容进行语音输出。The speaker is used for voice output of the content received by the communication module from the processing device and/or the target device.
  30. 如权利要求29所述的装置,其中,所述语音输出的内容包括如下至少一项:The device according to claim 29, wherein the content of the voice output includes at least one of the following:
    对执行命令的陈述内容;以及The statement of the execution order; and
    与用户的交互内容。Interaction content with the user.
  31. 如权利要求27所述的语音装置,还包括:The speech device according to claim 27, further comprising:
    经由近距离无线连接和/或有线连接与外部音箱相连的外部音箱连接模组,以使得所述外部音箱对所述通信模组从所述处理装置和/或所述目标设备接收到的内容进行语音输出。An external speaker connection module connected to an external speaker via a short-range wireless connection and/or a wired connection, so that the external speaker performs processing on the content received by the communication module from the processing device and/or the target device Voice output.
  32. 如权利要求27所述的语音装置,还包括:The speech device according to claim 27, further comprising:
    红外模组,用于对所述通信模组从所述处理装置接收到的命令进行红外输出,以实现对与所述命令相对应的目标设备的操作。The infrared module is used to perform infrared output on the command received by the communication module from the processing device, so as to realize the operation of the target device corresponding to the command.
  33. 如权利要求27所述的语音装置,其中,所述通信模组是低功耗短距离通信模组,并且包括如下至少一种:The voice device according to claim 27, wherein the communication module is a low-power short-distance communication module and includes at least one of the following:
    基于蓝牙技术与所述处理装置通信的蓝牙通信模组;A Bluetooth communication module that communicates with the processing device based on Bluetooth technology;
    基于红外技术与所述处理装置通信的红外通信模组;以及An infrared communication module based on infrared technology to communicate with the processing device; and
    基于Zigbee技术与所述处理装置通信的Zigbee通信模组。A Zigbee communication module based on Zigbee technology to communicate with the processing device.
  34. 如权利要求27所述的语音装置,还包括:The speech device according to claim 27, further comprising:
    供电模组,所述供电模组包括如下至少一项:A power supply module, the power supply module includes at least one of the following:
    无线充电组件;Wireless charging components;
    电池组件;Battery pack
    USB插口。USB socket.
  35. 如权利要求27所述的语音装置,还包括:The speech device according to claim 27, further comprising:
    用于附着于墙面、处理装置、目标设备、或其他设施表面的附着机构。An attachment mechanism for attaching to the wall, processing device, target equipment, or other facility surface.
  36. 一种语音控制方法,包括:A voice control method, including:
    语音装置采集语音输入;Voice device collects voice input;
    所述语音装置将所述语音输入发送给处理装置;The voice device sends the voice input to the processing device;
    所述处理装置实现针对所述语音输入的语义识别以及对应目标设备操作命令生成。The processing device implements semantic recognition for the voice input and generation of corresponding target device operation commands.
  37. 如权利要求36所述的方法,其中,所述语音装置将所述语音输入发送给处理装置包括:The method of claim 36, wherein the voice device sending the voice input to the processing device comprises:
    所述语音装置经由近距离通信将所述语音输入发送给处理装置。The voice device sends the voice input to the processing device via short-range communication.
  38. 如权利要求36所述的方法,其中,所述处理装置实现针对所述语音输入的语义识别以及对应目标设备操作命令生成包括:36. The method according to claim 36, wherein the processing means for realizing the semantic recognition of the voice input and generating the corresponding target device operation command comprises:
    所述处理装置将所述语音输入上传至服务端;The processing device uploads the voice input to the server;
    所述服务端对所述语音输入的语义识别,以获取识别语义所对应的目标设备操作的操作命令。The server performs semantic recognition of the voice input to obtain operation commands for the operation of the target device corresponding to the recognition semantics.
  39. 如权利要求38所述的方法,其中,所述处理装置将所述语音输入上传至服务端包括:The method of claim 38, wherein the processing device uploading the voice input to the server comprises:
    所述处理装置将所述语音输入经近距离通信上传至本地服务端。The processing device uploads the voice input to a local server via short-range communication.
  40. 如权利要求39所述的方法,其中,所述服务端对所述语音输入的语义识别,以获取识别语义所对应的目标设备操作的操作命令包括:The method of claim 39, wherein the semantic recognition of the voice input by the server to obtain the operation command of the target device operation corresponding to the recognition semantics comprises:
    所述本地服务端针对至少部分所述语音输入进行语义识别,生成识别语义所对应的目标设备操作的操作命令并下发所述操作命令。The local server performs semantic recognition for at least part of the voice input, generates an operation command for identifying the operation of the target device corresponding to the semantics, and issues the operation command.
  41. 如权利要求36所述的方法,其中,所述处理装置实现针对所述语音输入的语义识别以及对应目标设备操作命令生成包括:36. The method according to claim 36, wherein the processing means for realizing the semantic recognition of the voice input and generating the corresponding target device operation command comprises:
    所述处理装置针对至少部分所述语音输入进行语义识别,并生成识别语义所对应的目标设备操作的操作命令。The processing device performs semantic recognition for at least part of the voice input, and generates an operation command to recognize the operation of the target device corresponding to the semantics.
  42. 如权利要求36所述的方法,还包括:The method of claim 36, further comprising:
    使用所述语音装置和/或所述处理装置获取识别语义所对应的目标设备操作的操作命令;以及Use the voice device and/or the processing device to obtain the operation command for the operation of the target device corresponding to the recognition semantic; and
    所述语音装置和/或所述处理装置向所述目标设备下发所述操作命令。The voice device and/or the processing device issue the operation command to the target device.
  43. 如权利要求36所述的方法,还包括:The method of claim 36, further comprising:
    所述处理装置获取下发语音输出内容;The processing device acquires and delivers voice output content;
    所述处理装置向所述语音装置下发所述语音输出内容;以及The processing device delivers the voice output content to the voice device; and
    所述语音装置语音输出所述语音输出内容。The voice device voice outputs the voice output content.
PCT/CN2020/084427 2019-04-25 2020-04-13 Voice control system and method, and voice suite and voice apparatus WO2020216089A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910339913.9 2019-04-25
CN201910339913.9A CN111865728A (en) 2019-04-25 2019-04-25 Voice control system and method, voice suite and voice device

Publications (1)

Publication Number Publication Date
WO2020216089A1 true WO2020216089A1 (en) 2020-10-29

Family

ID=72940814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084427 WO2020216089A1 (en) 2019-04-25 2020-04-13 Voice control system and method, and voice suite and voice apparatus

Country Status (2)

Country Link
CN (1) CN111865728A (en)
WO (1) WO2020216089A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669844A (en) * 2020-12-25 2021-04-16 美的集团股份有限公司 Method for controlling equipment through voice paste, equipment control method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978873A (en) * 2014-04-08 2015-10-14 蔡依霖 Voice tag structure
CN105206275A (en) * 2015-08-31 2015-12-30 小米科技有限责任公司 Device control method, apparatus and terminal
CN108520746A (en) * 2018-03-22 2018-09-11 北京小米移动软件有限公司 The method, apparatus and storage medium of voice control smart machine
US20180308470A1 (en) * 2017-04-21 2018-10-25 Lg Electronics Inc. Voice recognition apparatus and voice recognition system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978873A (en) * 2014-04-08 2015-10-14 蔡依霖 Voice tag structure
CN105206275A (en) * 2015-08-31 2015-12-30 小米科技有限责任公司 Device control method, apparatus and terminal
US20180308470A1 (en) * 2017-04-21 2018-10-25 Lg Electronics Inc. Voice recognition apparatus and voice recognition system
CN108520746A (en) * 2018-03-22 2018-09-11 北京小米移动软件有限公司 The method, apparatus and storage medium of voice control smart machine

Also Published As

Publication number Publication date
CN111865728A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN107339786B (en) A kind of system and method for air-conditioning, regulation air-conditioning loudspeaker casting volume
CN203520624U (en) Remote controller expansion device internally provided with wireless fidelity (WIFI), and remote control system
CN104601838A (en) Voice and wireless control intelligent household appliance operation system
CN105451110A (en) Household appliance control voice earphone based on WiFi
CN110161910A (en) A kind of full room networking type intelligent appliance control system and control method
CN208369610U (en) A kind of intelligent domestic gateway
CN109545216A (en) A kind of audio recognition method and speech recognition system
WO2020216089A1 (en) Voice control system and method, and voice suite and voice apparatus
CN103607611A (en) Voice control method and system of intelligent television
CN103812736A (en) Smart home management center system
CN110164436A (en) The system and method for portable intelligent multipoint voice control household
CN113053371A (en) Voice control system and method, voice suite, bone conduction and voice processing device
CN102333162A (en) Remote control device for smart mobilephone
CN211791560U (en) Distributed voice control system
CN205958955U (en) Take speech recognition's zigBee intelligence home gateway camera
KR102320027B1 (en) Voice transmission method, voice transmission apparatus implementing the same, and system including the same
WO2018103623A1 (en) Terminal control method and apparatus, and terminal system
CN102568188A (en) Infrared remote control device and method for intelligent host
CN109458720B (en) Central air-conditioning system
CN211327139U (en) Intelligent control system applied to aromatherapy machine
CN202502615U (en) Infrared remote control device for smart host
CN112133075A (en) Wireless bluetooth speaker system with intelligent control
CN110971968A (en) Intelligent set top box system
CN217443749U (en) Integrated control box and intelligent home control system
CN109215649A (en) A kind of remote control device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20794056

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20794056

Country of ref document: EP

Kind code of ref document: A1