WO2020062862A1 - Voice interactive control method and device for speaker - Google Patents

Voice interactive control method and device for speaker Download PDF

Info

Publication number
WO2020062862A1
WO2020062862A1 PCT/CN2019/084834 CN2019084834W WO2020062862A1 WO 2020062862 A1 WO2020062862 A1 WO 2020062862A1 CN 2019084834 W CN2019084834 W CN 2019084834W WO 2020062862 A1 WO2020062862 A1 WO 2020062862A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
server
control instruction
speaker
voice
Prior art date
Application number
PCT/CN2019/084834
Other languages
French (fr)
Chinese (zh)
Inventor
祁学文
吴海全
迟欣
张恩勤
曹磊
师瑞文
Original Assignee
深圳市冠旭电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市冠旭电子股份有限公司 filed Critical 深圳市冠旭电子股份有限公司
Publication of WO2020062862A1 publication Critical patent/WO2020062862A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the invention belongs to the technical field of speaker control, and particularly relates to a method and device for interactive control of speaker voice.
  • smart terminal devices such as smart speakers, mobile phones, Bluetooth speakers used with mobile phones, etc.
  • users can interact with the cloud server through voice through the network.
  • the network is not good, it takes a long time to upload the voice to the cloud server and return the recognition result from the cloud server. Due to the delay of the network transmission, users often need to wait a long time after speaking. Get the voice return from the cloud, the user's voice interaction wait time is too long, the experience is not very good; currently some offline voice recognition is implemented locally, but it is basically limited to offline command parsing, the application scenarios are limited, and the user wants to achieve the voice playback effect still Poor.
  • the embodiments of the present invention provide a method and a device for voice interaction control of a speaker to solve the problems of application scenarios based on network transmission delay during voice interaction and offline command analysis in the prior art.
  • a first aspect of the embodiments of the present invention provides a method for interactively controlling a voice of a speaker, including:
  • a second aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:
  • the voice data matching the control instruction is sent to a speaker for voice data playback.
  • a third aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:
  • Wi-Fi speakers upload application scenarios to the server in advance
  • the server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;
  • Wi-Fi speakers receive and buffer the voice data
  • Wi-Fi speakers receive control instructions and match the control instructions with buffered voice data
  • the Wi-Fi speaker plays voice data matching the control instruction.
  • a fourth aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:
  • the application scenario is uploaded to the server in advance by the mobile terminal;
  • the server generates voice data corresponding to the application scenario according to the application scenario and sends it to the mobile terminal;
  • the mobile terminal receives and buffers the voice data
  • the Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal
  • the mobile terminal matches the buffered voice data according to the control instruction
  • the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;
  • a fifth aspect of the embodiments of the present invention provides a device for interactively controlling a voice of a speaker, including:
  • a first identification module configured to preset an application scenario and upload the application scenario to a server
  • a second identification module configured to receive a control instruction and match the control instruction with the voice data
  • the playing module is configured to play voice data matching the control instruction if the matching is successful.
  • a sixth aspect of the embodiments of the present invention provides a mobile terminal, including:
  • a third identification module configured to preset an application scenario and upload the application scenario to a server
  • a second database for receiving and buffering voice data corresponding to the application scenario returned by the server
  • a fourth identification module configured to receive a control instruction sent by the speaker, and match the control instruction with the voice data
  • the sending module is configured to: if the matching is successful, send the voice data matching the control instruction to a speaker for voice data playback.
  • a seventh aspect of the embodiments of the present invention provides a speaker voice interactive control system.
  • the system includes: a Wi-Fi speaker and a server;
  • the Wi-Fi speaker is used to upload an application scenario to a server in advance, receive and buffer voice data corresponding to the application scenario, receive control instructions, and match the control instruction with the buffered voice data. Play the voice data matching the control instructions;
  • the server is configured to analyze the application scenario and generate corresponding voice data.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements steps of a speaker voice interactive control method.
  • the embodiment of the present invention has a beneficial effect: when performing voice interaction through a speaker, the embodiment of the present invention uploads an application scenario in advance to a server and buffers voice data corresponding to the application scenario, and upon receiving a control instruction , The control instruction is matched with the buffered voice data, and the voice data is played directly after the matching is successful, which reduces the network delay during the voice interaction and improves the response rate of the voice interaction.
  • FIG. 1 is a schematic diagram of an applicable system scenario of a method for interactively controlling a speaker voice provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a speaker implementation process of a voice interactive control method according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a mobile terminal implementing a voice interaction control method according to an embodiment of the present invention
  • FIG. 4 is a diagram illustrating an example of an interaction flow of a Wi-Fi speaker system of a voice interaction control method according to an embodiment of the present invention
  • FIG. 5 is a diagram illustrating an example of an interaction flow of a Bluetooth speaker system of a voice interaction control method according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a device for interactively controlling a voice of a speaker provided by an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a system scenario to which a method for interactively controlling a speaker voice provided by an embodiment of the present invention is applied. For convenience of explanation, only a part related to this embodiment is shown.
  • the system may include: a Wi-Fi speaker 11 and a server 12; wherein, the Wi-Fi speaker 11 may upload an application scenario to a server, and buffer voice data corresponding to the application scenario returned by the server 12, The Fi speaker 11 receives the control instruction, matches the control instruction with the buffered voice data, and successfully plays the voice data that matches the control instruction.
  • the system may further include a Bluetooth speaker 21, a mobile terminal 22, and a server 12.
  • the mobile terminal 22 uploads an application scenario to the server 12, receives and caches voice data corresponding to the application scenario returned by the server 12, and the Bluetooth speaker 21 receives control And sends the control instruction to the mobile terminal 22 to match the voice data buffered by the mobile terminal 22. If the matching is successful, the mobile terminal 22 sends the voice data matching the control instruction to the Bluetooth speaker 21 for voice data playback.
  • FIG. 2 shows a schematic diagram of a speaker implementation process of the voice interactive control method according to an embodiment of the present invention.
  • the execution subject of the process is the Wi-Fi speaker 11 shown in FIG. 1, which is detailed as follows:
  • step S201 an application scenario is set in advance and the application scenario is uploaded to the server.
  • the application scenario may be a user-used scenario, such as weather, time, and the like; or an application scenario required by a user according to an actual application, such as song search, schedule, and the like. If it is a Wi-Fi speaker, the common application scenarios or individual application scenarios can be directly counted, and the application scenarios can be uploaded to the server through the network in advance.
  • the application scenario may be some default commonly used scenarios; it may also be a statistical application scenario of the user's usual usage habits, and the frequently used scenario is used as the common scenario; it may also be an application scenario set by the user himself; Constant input learning, new application scenarios obtained.
  • the step of setting an application scenario in advance and uploading the application scenario to a server includes:
  • the Wi-Fi speaker may be provided with a touch display screen, an entire column of microphones, and buttons.
  • the setting of the application scene may be performed by touching the scene input or by voice. You can also input scenes by pressing keys.
  • the scene can be input through the Bluetooth speaker and sent to the mobile terminal through the Bluetooth protocol; the mobile terminal performs statistics of the scene and uploads the application scene to the server through the network.
  • Step S202 Receive and cache the voice data corresponding to the application scenario returned by the server.
  • the voice data corresponding to the application scenario is the server parsing one or more application scenarios to generate data information corresponding to different application scenarios, which may include single character or multiple character voice data information. If it is a Wi-Fi speaker, it receives the returned voice data and caches the value of the Wi-Fi speaker locally; if it is a Bluetooth speaker, it receives the voice data through the mobile terminal and caches it locally on the mobile terminal.
  • the buffered voice data corresponding to the application scenario may be periodically acquired and buffered according to a set time interval to ensure the real-time nature of the voice data, such as weather, which changes over time.
  • Application scenario set a fixed time to update the cache; or cache similar style songs in advance according to the user's listening habits.
  • Step S203 Receive a control instruction, and match the control instruction with the voice data.
  • control instruction may be a voice control instruction or a signal control instruction input through a remote control or other equipment;
  • voice data corresponds to a variety of application scenarios, and the control instruction and the buffered voice data
  • the matching can be performed by extracting keyword matching or string matching to obtain voice data matching the input control instruction.
  • control instructions do not distinguish the complexity or simplicity of the instructions. No matter what kind of control instructions are used, the control instructions are matched with the locally buffered voice data. If it is a Wi-Fi speaker, after receiving the voice control command, it will directly match the local buffered voice data; if it is a Bluetooth speaker, it will receive the control command and send the control command to the mobile terminal. Match the voice data buffered by the terminal.
  • step S204 if the matching is successful, the voice data matching the control instruction is played.
  • the voice data matching the control instruction is played through the speaker of the speaker; since the speaker can set different playback sound effect modes, according to The environment can set the playback sound effect mode to achieve better playback effects.
  • voice data matching the control instruction can be sent to the Bluetooth speaker through a mobile terminal to play the voice data.
  • step of receiving the control instruction and matching the control instruction with the voice data includes:
  • the voice data received from the server is received, and the feedback voice data is played.
  • control instruction after receiving a control instruction and identifying it, it will also be uploaded to the server when it is matched with the local voice data. If the control instruction matches the local voice data successfully, it will no longer receive the server generated For voice data, the matched voice data is directly played through the speaker of the speaker; if there is no voice data matching the control instruction in the local cache, the voice data generated by the server is parsed and the returned voice data is played.
  • the Wi-Fi speaker terminal or mobile terminal saves the application scenario information corresponding to the current control instruction according to the newly input current control instruction, and continuously adds statistics and learns new application scenarios to add more comprehensive application scenarios.
  • FIG. 3 is a schematic diagram of a mobile terminal implementation process of the voice interaction control method according to an embodiment of the present invention.
  • the execution body of the process is the mobile terminal 22 shown in FIG. 1, which is detailed as follows:
  • step S301 an application scenario is set in advance and the application scenario is uploaded to the server.
  • the voice interaction performed by the Bluetooth speaker is realized through the connection with the mobile terminal; the Bluetooth speaker can perform recording and playback, and the voice interaction and voice feedback are completed through the application of the mobile terminal.
  • the mobile terminal can set application scenarios.
  • the application scenarios do not distinguish between complex or simple application scenarios, and only count the application scenarios that are commonly used or newly input by the user.
  • Common application scenarios such as: weather, time, etc .;
  • the setting of application scenarios can be some default common scenarios; it can also count the application scenarios of the user's usual usage habits, and use the frequently used scenarios as the common scenarios; it can also be the application scenario set by the user; it can also be through continuous Enter learning and get new application scenarios.
  • the setting of the application scenario may be one or more of touch input, voice input, or key input; and the set application scenario is uploaded to the server through the network; wherein the server may be an independent server or a mobile terminal Cloud corresponding to your application.
  • Step 302 Receive and cache the voice data corresponding to the application scenario returned by the server.
  • the voice data corresponding to the application scenario is the server parsing one or more application scenarios to generate voice data corresponding to different application scenarios, which may include single character or multiple character voice data Information; receive voice data and cache to mobile terminal.
  • the buffered voice data corresponding to the application scenario may be periodically acquired and buffered according to a set time interval to ensure the real-time nature of the voice data, such as weather, which changes over time.
  • Application scenario set a fixed time to update the cache; or cache similar style songs in advance according to the user's listening habits.
  • Step S303 Receive a control instruction sent by the speaker, and match the control instruction with the voice data.
  • control instruction may be a voice control instruction or a signal control instruction input through a remote control or other equipment;
  • voice data corresponds to a variety of application scenarios, and the control instruction and the buffered voice data
  • the matching can be performed by extracting keyword matching or string matching to obtain voice data matching the input control instruction.
  • control instructions do not distinguish the complexity or simplicity of the instructions. No matter what kind of control instructions are used, the control instructions are matched with the locally buffered voice data.
  • step 304 if the matching is successful, the voice data matching the control instruction is sent to a speaker for voice data playback.
  • the voice data matching the control instruction is sent to the Bluetooth speaker and played through the speaker of the Bluetooth speaker.
  • the voice data matching the control instruction is sent to the Bluetooth speaker and played through the speaker of the Bluetooth speaker.
  • the method further includes:
  • control instruction sent by the Bluetooth speaker after receiving the control instruction sent by the Bluetooth speaker and identifying it, it will also be uploaded to the server when it matches the local voice data. If the control instruction matches the local voice data of the mobile terminal successfully, it will no longer be Receive the voice data generated by the server, and directly send the matched voice data to the Bluetooth speaker for playback; if there is no voice data matching the control instructions in the local cache of the mobile terminal, the voice data generated by the server is received and sent to Bluetooth speaker for voice data playback.
  • the mobile terminal saves the application scenario information corresponding to the current control instruction according to the newly inputted current control instruction, and continuously counts and learns new application scenarios to add more comprehensive application scenarios.
  • FIG. 4 is a diagram illustrating an example of an interactive flow of a Wi-Fi speaker system of a voice interaction control method according to an embodiment of the present invention.
  • the execution subject participating in the interactive flow includes the Wi-Fi speaker 11 and the server 12 in FIG. 1.
  • the implementation principle is consistent with the implementation principle of each execution subject side described in FIG. 2 and FIG. 3, so this interaction process is only briefly described, and is not described in detail:
  • Wi-Fi speakers upload application scenarios to the server in advance
  • the server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;
  • the Wi-Fi speaker receives and buffers the voice data
  • the Wi-Fi speaker receives the control instructions and matches the control instructions with the buffered voice data
  • the Wi-Fi speaker plays voice data matching the control instruction.
  • the Wi-Fi speaker can receive an application scenario of voice input, key input, or touch input.
  • FIG. 5 shows an example flowchart of a Bluetooth speaker system interaction process of a voice interaction control method provided by an embodiment of the present invention
  • the execution subject participating in the interaction process includes the Bluetooth speaker 21, the mobile terminal 22, and the server 12 in FIG.
  • the implementation principle is consistent with the implementation principle of each execution subject side described in FIG. 2 and FIG. 3, so this interaction process is only briefly described, and is not described in detail:
  • the application scenario is uploaded to the server in advance by the mobile terminal;
  • the server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the mobile terminal;
  • the mobile terminal receives and buffers the voice data
  • the Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal
  • the mobile terminal matches the buffered voice data according to the control instruction
  • the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;
  • Bluetooth speaker plays voice data.
  • the method for controlling voice interaction further includes:
  • the mobile terminal sends the control instruction to the server
  • the server generates corresponding voice data according to the control instruction, and sends the voice data to the mobile terminal;
  • the mobile terminal receives the voice data fed back by the server and sends the voice data to the Bluetooth speaker.
  • the application scenario is uploaded to the server in advance and the voice data corresponding to the application scenario is buffered.
  • the control instruction is matched with the buffered voice data, and the matching is successful Then the voice data is directly played, which reduces the network delay during the voice interaction and improves the response rate of the voice interaction.
  • FIG. 6 is a schematic diagram of a device for voice interactive control of a speaker provided in an embodiment of the present invention.
  • the device includes:
  • a first identification module 61 configured to preset an application scenario and upload the application scenario to a server;
  • a first database 62 configured to receive and cache voice data corresponding to the application scenario returned by the server;
  • a second identification module 63 configured to receive a control instruction and match the control instruction with the voice data
  • the playing module 64 is configured to play voice data matching the control instruction if the matching is successful.
  • an embodiment of the present invention provides a mobile terminal, where the mobile terminal includes:
  • a third identification module configured to preset an application scenario and upload the application scenario to a server
  • a second database for receiving and buffering voice data corresponding to the application scenario returned by the server
  • a fourth identification module configured to receive a control instruction sent by the speaker, and match the control instruction with the voice data
  • the sending module is configured to: if the matching is successful, send the voice data matching the control instruction to a speaker for voice data playback.
  • an embodiment of the present invention provides a speaker voice interactive control system, including: a Wi-Fi speaker and a server; wherein the Wi-Fi speaker is used to upload an application scenario to the server in advance, receive and cache and apply The voice data corresponding to the scene receives the control instructions and matches the control instructions with the buffered voice data. If the matching is successful, the voice data matching the control instructions is played;
  • the server is configured to analyze the application scenario and generate corresponding voice data.
  • an embodiment of the present invention further provides a speaker voice interactive control system, including: a Bluetooth speaker, a mobile terminal, and a server; wherein the Bluetooth speaker is used to receive a control instruction and send the control instruction to the mobile terminal, And receiving voice data sent by the mobile terminal, and playing the voice data;
  • a speaker voice interactive control system including: a Bluetooth speaker, a mobile terminal, and a server; wherein the Bluetooth speaker is used to receive a control instruction and send the control instruction to the mobile terminal, And receiving voice data sent by the mobile terminal, and playing the voice data;
  • the mobile terminal is used for uploading an application scenario to a server, receiving and buffering voice data corresponding to the application scenario, receiving a control instruction sent by a Bluetooth speaker, and matching the control instruction with the buffered voice data.
  • the voice data matching the control instruction is sent to the Bluetooth speaker;
  • the server is configured to analyze an application scenario, generate corresponding voice data, and send the voice data to a mobile terminal.
  • An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program implements steps of a speaker voice interactive control method when the computer program is executed by a processor.
  • the disclosed apparatus / terminal device and method may be implemented in other ways.
  • the device / terminal device embodiments described above are only schematic.
  • the division of the modules or units is only a logical function division.
  • components can be combined or integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated module / unit When the integrated module / unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on such an understanding, the present invention implements all or part of the processes in the methods of the above embodiments, and may also be completed by a computer program instructing related hardware.
  • the computer program may be stored in a computer-readable storage medium.
  • the computer When the program is executed by a processor, the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electric carrier signals telecommunication signals
  • software distribution media any entity or device capable of carrying the computer program code
  • a recording medium a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to the technical field of speaker control and provides a voice interactive control method and device for a speaker, the method comprising: presetting an application scenario and uploading the same to a server (S201); receiving and caching voice data returned by the server that corresponds to the application scenario (S202); receiving a control command and matching the control command against the voice data (S203); and if a match is found, playing the voice data matching the control command (S204). The present invention reduces the waiting time for smart voice interaction and thus realizes a quick response of voice interaction.

Description

一种音箱语音交互控制的方法及装置Method and device for interactive control of speaker voice 技术领域Technical field
本发明属于音箱控制技术领域,尤其涉及一种音箱语音交互控制的方法及装置。The invention belongs to the technical field of speaker control, and particularly relates to a method and device for interactive control of speaker voice.
背景技术Background technique
目前智能终端设备(如智能音箱,手机,配合手机使用的蓝牙音箱等)越来越多的接入到云端服务器,用户可以通过语音,经过网络与云端服务器进行语音交互。然而,当网络状况不好时,语音上传到云端服务器以及从云端服务器返回识别结果都需要较长的时间,由于网络传输的延时,往往用户说完语音后,需要较长的等待时间,才能获得云端的语音返回,用户语音交互等待时间过长,体验不是很好;目前本地实现了一些离线的语音识别,但基本局限于离线命令解析,应用场景有限,对于用户希望达到的语音播放效果依然欠佳。At present, smart terminal devices (such as smart speakers, mobile phones, Bluetooth speakers used with mobile phones, etc.) are more and more connected to the cloud server, and users can interact with the cloud server through voice through the network. However, when the network is not good, it takes a long time to upload the voice to the cloud server and return the recognition result from the cloud server. Due to the delay of the network transmission, users often need to wait a long time after speaking. Get the voice return from the cloud, the user's voice interaction wait time is too long, the experience is not very good; currently some offline voice recognition is implemented locally, but it is basically limited to offline command parsing, the application scenarios are limited, and the user wants to achieve the voice playback effect still Poor.
技术问题technical problem
有鉴于此,本发明实施例提供了一种音箱语音交互控制的方法及装置,以解决现有技术中语音交互时网络传输延时、基于离线命令解析的应用场景有的问题。In view of this, the embodiments of the present invention provide a method and a device for voice interaction control of a speaker to solve the problems of application scenarios based on network transmission delay during voice interaction and offline command analysis in the prior art.
技术解决方案Technical solutions
本发明实施例的第一方面提供了一种音箱语音交互控制的方法,包括:A first aspect of the embodiments of the present invention provides a method for interactively controlling a voice of a speaker, including:
预先设置应用场景并将所述应用场景上传至服务器;Preset application scenarios and upload the application scenarios to the server;
接收并缓存服务器返回的与所述应用场景对应的语音数据;Receiving and buffering voice data corresponding to the application scenario returned by the server;
接收控制指令,并将所述控制指令与所述语音数据进行匹配;Receiving a control instruction and matching the control instruction with the voice data;
若匹配成功,则播放与所述控制指令匹配的语音数据。If the matching is successful, the voice data matching the control instruction is played.
本发明实施例的第二方面提供了一种音箱语音交互控制的方法,包括:A second aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:
预先设置应用场景并将所述应用场景上传至服务器;Preset application scenarios and upload the application scenarios to the server;
接收并缓存服务器返回的与所述应用场景对应的语音数据;Receiving and buffering voice data corresponding to the application scenario returned by the server;
接收由音箱发送的控制指令,并将所述控制指令与所述语音数据进行匹配;Receiving a control instruction sent by a speaker, and matching the control instruction with the voice data;
若匹配成功,则将与所述控制指令匹配的语音数据发送至音箱进行语音数据播放。If the matching is successful, the voice data matching the control instruction is sent to a speaker for voice data playback.
本发明实施例的第三方面提供了一种音箱语音交互控制的方法,包括:A third aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:
由Wi-Fi音箱预先将应用场景上传至服务器;Wi-Fi speakers upload application scenarios to the server in advance;
服务器根据应用场景生成与应用场景对应的语音数据,并将所述语音数据发送至Wi-Fi音箱;The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;
Wi-Fi音箱接收并缓存所述语音数据;Wi-Fi speakers receive and buffer the voice data;
Wi-Fi音箱接收控制指令,并将控制指令与缓存的语音数据进行匹配;Wi-Fi speakers receive control instructions and match the control instructions with buffered voice data;
若匹配成功,Wi-Fi音箱播放与所述控制指令匹配的语音数据。If the matching is successful, the Wi-Fi speaker plays voice data matching the control instruction.
本发明实施例的第四方面提供了一种音箱语音交互控制的方法,包括:A fourth aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:
由移动终端预先将应用场景上传至服务器;The application scenario is uploaded to the server in advance by the mobile terminal;
服务器根据所述应用场景生成与应用场景对应的语音数据,并发送至移动终端;The server generates voice data corresponding to the application scenario according to the application scenario and sends it to the mobile terminal;
移动终端接收并缓存所述语音数据;The mobile terminal receives and buffers the voice data;
蓝牙音箱接收控制指令并将控制指令发送至移动终端;The Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal;
移动终端根据控制指令与缓存的语音数据进行匹配;The mobile terminal matches the buffered voice data according to the control instruction;
若匹配成功,移动终端将与所述控制指令匹配的语音数据发送至蓝牙音箱;If the matching is successful, the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;
蓝牙音箱播放语音数据。Bluetooth speakers play voice data.
本发明实施例的第五方面提供了一种音箱语音交互控制的装置,包括:A fifth aspect of the embodiments of the present invention provides a device for interactively controlling a voice of a speaker, including:
第一识别模块,用于预先设置应用场景并将所述应用场景上传至服务器;A first identification module, configured to preset an application scenario and upload the application scenario to a server;
第一数据库,用于接收并缓存服务器返回的与所述应用场景对应的语音数据;A first database for receiving and buffering voice data corresponding to the application scenario returned by the server;
第二识别模块,用于接收控制指令,并将所述控制指令与所述语音数据进行匹配;A second identification module, configured to receive a control instruction and match the control instruction with the voice data;
播放模块,用于若匹配成功,则播放与所述控制指令匹配的语音数据。The playing module is configured to play voice data matching the control instruction if the matching is successful.
本发明实施例的第六方面提供了一种移动终端,包括:A sixth aspect of the embodiments of the present invention provides a mobile terminal, including:
第三识别模块,用于预先设置应用场景并将所述应用场景上传至服务器;A third identification module, configured to preset an application scenario and upload the application scenario to a server;
第二数据库,用于接收并缓存服务器返回的与所述应用场景对应的语音数据;A second database for receiving and buffering voice data corresponding to the application scenario returned by the server;
第四识别模块,用于接收由音箱发送的控制指令,并将所述控制指令与所述语音数据进行匹配;A fourth identification module, configured to receive a control instruction sent by the speaker, and match the control instruction with the voice data;
发送模块,用于若匹配成功,则将与所述控制指令匹配的语音数据发送至音箱进行语音数据播放。The sending module is configured to: if the matching is successful, send the voice data matching the control instruction to a speaker for voice data playback.
本发明实施例的第七方面提供了一种音箱语音交互控制系统,所述系统包括:Wi-Fi音箱、服务器;A seventh aspect of the embodiments of the present invention provides a speaker voice interactive control system. The system includes: a Wi-Fi speaker and a server;
其中,所述Wi-Fi音箱,用于预先将应用场景上传至服务器,接收并缓存与应用场景对应的语音数据,接收控制指令,并将控制指令与缓存的语音数据进行匹配,匹配成功,则播放与控制指令匹配的语音数据;The Wi-Fi speaker is used to upload an application scenario to a server in advance, receive and buffer voice data corresponding to the application scenario, receive control instructions, and match the control instruction with the buffered voice data. Play the voice data matching the control instructions;
所述服务器,用于对应用场景进行解析生成对应的语音数据。The server is configured to analyze the application scenario and generate corresponding voice data.
本发明实施例的第八方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有算计机程序,所述计算机程序被处理器执行时实现音箱语音交互控制方法的步骤。According to an eighth aspect of the embodiments of the present invention, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements steps of a speaker voice interactive control method.
有益效果Beneficial effect
本发明实施例与现有技术相比存在的有益效果是:本发明实施例在通过音箱进行语音交互时,将应用场景提前上传服务器并缓存与应用场景对应的语音数据,在接收到控制指令时,将控制指令与缓存的语音数据进行匹配,匹配成功则直接进行语音数据的播放,减少了语音交互过程中的网络延时,提高了语音交互的响应速率。Compared with the prior art, the embodiment of the present invention has a beneficial effect: when performing voice interaction through a speaker, the embodiment of the present invention uploads an application scenario in advance to a server and buffers voice data corresponding to the application scenario, and upon receiving a control instruction , The control instruction is matched with the buffered voice data, and the voice data is played directly after the matching is successful, which reduces the network delay during the voice interaction and improves the response rate of the voice interaction.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only the present invention. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained according to these drawings without paying creative labor.
图1是本发明实施例提供的音箱语音交互控制方法的所适用的系统场景示意图;FIG. 1 is a schematic diagram of an applicable system scenario of a method for interactively controlling a speaker voice provided by an embodiment of the present invention; FIG.
图2是本发明实施例提供的语音交互控制方法的音箱实现流程示意图;2 is a schematic diagram of a speaker implementation process of a voice interactive control method according to an embodiment of the present invention;
图3是本发明实施例提供的语音交互控制方法的移动终端实现流程示意图;FIG. 3 is a schematic flowchart of a mobile terminal implementing a voice interaction control method according to an embodiment of the present invention; FIG.
图4是本发明实施例提供的语音交互控制方法Wi-Fi音箱系统交互流程示例图;4 is a diagram illustrating an example of an interaction flow of a Wi-Fi speaker system of a voice interaction control method according to an embodiment of the present invention;
图5是本发明实施例提供的语音交互控制方法蓝牙音箱系统交互流程实例图;5 is a diagram illustrating an example of an interaction flow of a Bluetooth speaker system of a voice interaction control method according to an embodiment of the present invention;
图6是本发明实施例提供的音箱语音交互控制的装置的示意图。FIG. 6 is a schematic diagram of a device for interactively controlling a voice of a speaker provided by an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are provided in order to thoroughly understand the embodiments of the present invention. However, it should be clear to a person skilled in the art that the present invention can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary details.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and / or components, but does not exclude one or more other features , The whole, steps, operations, elements, components, and / or their presence or addition.
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in the description of the invention and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly indicates otherwise.
还应当进一步理解,在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should be further understood that the term "and / or" used in the present description and the appended claims refers to any combination of one or more of the listed items and all possible combinations, and includes these combinations .
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。In order to explain the technical solution of the present invention, the following description is made through specific embodiments.
图1示出了本发明实施例提供的音箱语音交互控制方法所适用的系统场景示意图,为了便于说明,仅示出了与本实施例相关的部分。FIG. 1 is a schematic diagram of a system scenario to which a method for interactively controlling a speaker voice provided by an embodiment of the present invention is applied. For convenience of explanation, only a part related to this embodiment is shown.
参照图1,该系统可以包括:Wi-Fi音箱11、服务器12;其中,Wi-Fi音箱11可以将应用场景上传至服务器,并缓存服务器12返回的与应用场景对应的语音数据,在Wi-Fi音箱11接收到控制指令,将控制指令与缓存的语音数据进行匹配,匹配成功则播放与控制指令匹配的语音数据。Referring to FIG. 1, the system may include: a Wi-Fi speaker 11 and a server 12; wherein, the Wi-Fi speaker 11 may upload an application scenario to a server, and buffer voice data corresponding to the application scenario returned by the server 12, The Fi speaker 11 receives the control instruction, matches the control instruction with the buffered voice data, and successfully plays the voice data that matches the control instruction.
该系统还可以包括:蓝牙音箱21、移动终端22和服务器12;其中,移动终端22将应用场景上传至服务器12,接收并缓存服务器12返回的与应用场景对应的语音数据,蓝牙音箱21接收控制指令,并将控制指令发送至移动终端22,与移动终端22缓存的语音数据进行匹配,匹配成功,移动终端22将与控制指令匹配的语音数据发送至蓝牙音箱21进行语音数据播放。The system may further include a Bluetooth speaker 21, a mobile terminal 22, and a server 12. Among them, the mobile terminal 22 uploads an application scenario to the server 12, receives and caches voice data corresponding to the application scenario returned by the server 12, and the Bluetooth speaker 21 receives control And sends the control instruction to the mobile terminal 22 to match the voice data buffered by the mobile terminal 22. If the matching is successful, the mobile terminal 22 sends the voice data matching the control instruction to the Bluetooth speaker 21 for voice data playback.
下面对图1所示的系统场景下的音箱语音交互方法进行详细阐述:The speaker voice interaction method in the system scenario shown in FIG. 1 is described in detail below:
图2示出了本发明实施例提供的语音交互控制方法的音箱实现流程示意图,在本实施例中,该流程的执行主体为图1所示的Wi-Fi音箱11,详述如下:FIG. 2 shows a schematic diagram of a speaker implementation process of the voice interactive control method according to an embodiment of the present invention. In this embodiment, the execution subject of the process is the Wi-Fi speaker 11 shown in FIG. 1, which is detailed as follows:
步骤S201,预先设置应用场景并将所述应用场景上传至服务器。In step S201, an application scenario is set in advance and the application scenario is uploaded to the server.
在本发明实施例中,所述的应用场景可以是用户常用场景,例如天气、时间等场景;也可以是用户根据实际应用所需要的应用场景,例如:歌曲搜索、行程安排等。若为Wi-Fi音箱,则可以对常用应用场景或个别应用场景直接进行统计,并将应用场景提前通过网络上传至服务器。In the embodiment of the present invention, the application scenario may be a user-used scenario, such as weather, time, and the like; or an application scenario required by a user according to an actual application, such as song search, schedule, and the like. If it is a Wi-Fi speaker, the common application scenarios or individual application scenarios can be directly counted, and the application scenarios can be uploaded to the server through the network in advance.
另外,所述的应用场景可以是一些默认常用场景;也可以统计用户平时的使用习惯的应用场景,使用频次高的场景作为常用场景;还可以是用户自己设定的应用场景;还可以是通过不断的输入学习,获取的新的应用场景。In addition, the application scenario may be some default commonly used scenarios; it may also be a statistical application scenario of the user's usual usage habits, and the frequently used scenario is used as the common scenario; it may also be an application scenario set by the user himself; Constant input learning, new application scenarios obtained.
进一步的,所述预先设置应用场景并将所述应用场景上传至服务器的步骤,包括:Further, the step of setting an application scenario in advance and uploading the application scenario to a server includes:
接收触摸输入的应用场景并上传至服务器;和/或接收语音输入的应用场景并上传至服务器;和/或接收按键输入的应用场景并上传至服务器。Receive application scenarios for touch input and upload to the server; and / or application scenarios for voice input and upload to the server; and / or application scenarios for key input and upload to the server.
在本发明实施例中,Wi-Fi音箱可以设置有触摸显示屏,也可以设置有麦克风整列,还可以设置有按键;从而应用场景的设置可以通过触摸进行场景输入,也通过语音进行场景的输入,还可以通过按键进行场景的输入。In the embodiment of the present invention, the Wi-Fi speaker may be provided with a touch display screen, an entire column of microphones, and buttons. Thus, the setting of the application scene may be performed by touching the scene input or by voice. You can also input scenes by pressing keys.
另外,若为蓝牙音箱,则可以通过蓝牙音箱进行场景的输入,通过蓝牙协议发送至移动终端;由移动终端进行场景的统计,并将应用场景通过网络上传至服务器。In addition, if it is a Bluetooth speaker, the scene can be input through the Bluetooth speaker and sent to the mobile terminal through the Bluetooth protocol; the mobile terminal performs statistics of the scene and uploads the application scene to the server through the network.
步骤S202,接收并缓存服务器返回的与所述应用场景对应的语音数据。Step S202: Receive and cache the voice data corresponding to the application scenario returned by the server.
在本发明实施例中,所述的与应用场景对应的语音数据,为服务器对一个或多个应用场景进行解析,生成对应不同应用场景的数据信息,可以包括单个字符或多个字符的语音数据信息。若为Wi-Fi音箱,接收返回的语音数据并缓存值Wi-Fi音箱本地;若为蓝牙音箱,则通过移动终端接收语音数据并缓存至移动终端本地。In the embodiment of the present invention, the voice data corresponding to the application scenario is the server parsing one or more application scenarios to generate data information corresponding to different application scenarios, which may include single character or multiple character voice data information. If it is a Wi-Fi speaker, it receives the returned voice data and caches the value of the Wi-Fi speaker locally; if it is a Bluetooth speaker, it receives the voice data through the mobile terminal and caches it locally on the mobile terminal.
另外,所述的缓存与应用场景对应的语音数据可以是按照设定的时间间隔,定时获取并进行缓存的,以保证语音数据的的实时性,例如,天气等,随时间的变化而发生改变的应用场景,则设定固定的时间进行更新缓存;或者根据用户的听歌习惯将类似的风格的歌曲提前缓存等。In addition, the buffered voice data corresponding to the application scenario may be periodically acquired and buffered according to a set time interval to ensure the real-time nature of the voice data, such as weather, which changes over time. Application scenario, set a fixed time to update the cache; or cache similar style songs in advance according to the user's listening habits.
步骤S203,接收控制指令,并将所述控制指令与所述语音数据进行匹配。Step S203: Receive a control instruction, and match the control instruction with the voice data.
在本发明实施例中,所述控制指令可以是语音控制指令,也可以是通过遥控或其它设备输入的信号控制指令;所述的语音数据对应多种应用场景,将控制指令与缓存的语音数据进行匹配,可以通过提取关键字匹配,也可以通过字符串匹配,获取与输入的控制指令匹配的语音数据。In the embodiment of the present invention, the control instruction may be a voice control instruction or a signal control instruction input through a remote control or other equipment; the voice data corresponds to a variety of application scenarios, and the control instruction and the buffered voice data The matching can be performed by extracting keyword matching or string matching to obtain voice data matching the input control instruction.
另外,所述控制指令不区分指令的复杂或简单,无论是何种场景的控制指令,都进行与本地缓存的语音数据的匹配。若为Wi-Fi音箱,接收控制指令经过语音识别后直接与本地缓存的语音数据进行匹配;若为蓝牙音箱,则接收控制指令,并将控制指令发送至移动终端,进行语音识别后,与移动终端缓存的语音数据进行匹配。In addition, the control instructions do not distinguish the complexity or simplicity of the instructions. No matter what kind of control instructions are used, the control instructions are matched with the locally buffered voice data. If it is a Wi-Fi speaker, after receiving the voice control command, it will directly match the local buffered voice data; if it is a Bluetooth speaker, it will receive the control command and send the control command to the mobile terminal. Match the voice data buffered by the terminal.
步骤S204,若匹配成功,则播放与所述控制指令匹配的语音数据。In step S204, if the matching is successful, the voice data matching the control instruction is played.
在本发明实施例中,在本地缓存的数据库中,存在与控制指令匹配的语音数据,则将与控制指令匹配的语音数据通过音箱的扬声器进行播放;由于音箱可以设置不同的播放音效模式,根据环境可以进行播放音效模式的设定,实现更佳的播放效果。In the embodiment of the present invention, in the locally cached database, there is voice data matching the control instruction, and the voice data matching the control instruction is played through the speaker of the speaker; since the speaker can set different playback sound effect modes, according to The environment can set the playback sound effect mode to achieve better playback effects.
另外,对于蓝牙音箱,可以通过移动终端将与控制指令匹配的语音数据发送至蓝牙音箱,进行语音数据的播放。In addition, for a Bluetooth speaker, voice data matching the control instruction can be sent to the Bluetooth speaker through a mobile terminal to play the voice data.
进一步的,所述接收控制指令,并将所述控制指令与所述语音数据进行匹配的在步骤,包括:Further, the step of receiving the control instruction and matching the control instruction with the voice data includes:
将所述控制指令与所述语音数据进行匹配的同时,上传所述控制指令至服务器;Uploading the control instruction to the server while matching the control instruction with the voice data;
若匹配不成功,则接收服务器反馈的语音数据,播放反馈的语音数据。If the match is unsuccessful, the voice data received from the server is received, and the feedback voice data is played.
在本发明实施例中,接收控制指令,并进行识别后,在进行与本地语音数据匹配的同时,也会上传至服务器,若控制指令与本地的语音数据匹配成功,则不再接收服务器生成的语音数据,直接将匹配的语音数据通过音箱的扬声器进行播放;若在本地缓存中,没有与控制指令匹配的语音数据,则通过接收由服务器解析生成的语音数据,并播放返回的语音数据。In the embodiment of the present invention, after receiving a control instruction and identifying it, it will also be uploaded to the server when it is matched with the local voice data. If the control instruction matches the local voice data successfully, it will no longer receive the server generated For voice data, the matched voice data is directly played through the speaker of the speaker; if there is no voice data matching the control instruction in the local cache, the voice data generated by the server is parsed and the returned voice data is played.
另外,Wi-Fi音箱端或者移动终端根据新输入的当前控制指令,保存当前控制指令对应的应用场景信息,通过不断的统计与学习新的应用场景,增加更全面的应用场景。In addition, the Wi-Fi speaker terminal or mobile terminal saves the application scenario information corresponding to the current control instruction according to the newly input current control instruction, and continuously adds statistics and learns new application scenarios to add more comprehensive application scenarios.
图3示出了本发明实施例提供的语音交互控制方法的移动终端实现流程示意图,在本实施例汇总,该流程的执行主体为图1所示的移动终端22,详述如下:FIG. 3 is a schematic diagram of a mobile terminal implementation process of the voice interaction control method according to an embodiment of the present invention. In this embodiment, the execution body of the process is the mobile terminal 22 shown in FIG. 1, which is detailed as follows:
步骤S301,预先设置应用场景并将所述应用场景上传至服务器。In step S301, an application scenario is set in advance and the application scenario is uploaded to the server.
在本发明实施例中,对于蓝牙音箱进行的语音交互,通过与移动终端的连接实现;蓝牙音箱可以进行录音和播放,进行语音的交互和语音的反馈则通过移动终端的应用完成。In the embodiment of the present invention, the voice interaction performed by the Bluetooth speaker is realized through the connection with the mobile terminal; the Bluetooth speaker can perform recording and playback, and the voice interaction and voice feedback are completed through the application of the mobile terminal.
移动终端可以进行应用场景的设置,所述的应用场景不区分复杂或简单的应用场景,只对用户常用的或新输入的应用场景进行统计;常用的应用场景例如:天气、时间等;所述应用场景的设置可以是是一些默认常用场景;也可以统计用户平时的使用习惯的应用场景,使用频次高的场景作为常用场景;还可以是用户自己设定的应用场景;还可以是通过不断的输入学习,获取的新的应用场景。The mobile terminal can set application scenarios. The application scenarios do not distinguish between complex or simple application scenarios, and only count the application scenarios that are commonly used or newly input by the user. Common application scenarios such as: weather, time, etc .; The setting of application scenarios can be some default common scenarios; it can also count the application scenarios of the user's usual usage habits, and use the frequently used scenarios as the common scenarios; it can also be the application scenario set by the user; it can also be through continuous Enter learning and get new application scenarios.
另外,应用场景的设置可以通过触摸输入、语音输入或按键输入中一种或多种;并将设置的应用场景通过网络上传至服务器;其中,所述服务器可以是独立服务器,也可以是移动终端的应用程序对应的云端。In addition, the setting of the application scenario may be one or more of touch input, voice input, or key input; and the set application scenario is uploaded to the server through the network; wherein the server may be an independent server or a mobile terminal Cloud corresponding to your application.
步骤302,接收并缓存服务器返回的与所述应用场景对应的语音数据。Step 302: Receive and cache the voice data corresponding to the application scenario returned by the server.
在本发明实施例中,所述的与应用场景对应的语音数据,为服务器对一个或多个应用场景进行解析,生成对应不同应用场景的语音数据,可以包括单个字符或多个字符的语音数据信息;接收语音数据并缓存至移动终端本地。In the embodiment of the present invention, the voice data corresponding to the application scenario is the server parsing one or more application scenarios to generate voice data corresponding to different application scenarios, which may include single character or multiple character voice data Information; receive voice data and cache to mobile terminal.
另外,所述的缓存与应用场景对应的语音数据可以是按照设定的时间间隔,定时获取并进行缓存的,以保证语音数据的的实时性,例如,天气等,随时间的变化而发生改变的应用场景,则设定固定的时间进行更新缓存;或者根据用户的听歌习惯将类似的风格的歌曲提前缓存等。In addition, the buffered voice data corresponding to the application scenario may be periodically acquired and buffered according to a set time interval to ensure the real-time nature of the voice data, such as weather, which changes over time. Application scenario, set a fixed time to update the cache; or cache similar style songs in advance according to the user's listening habits.
步骤S303,接收由音箱发送的控制指令,并将所述控制指令与所述语音数据进行匹配。Step S303: Receive a control instruction sent by the speaker, and match the control instruction with the voice data.
在本发明实施例中,所述控制指令可以是语音控制指令,也可以是通过遥控或其它设备输入的信号控制指令;所述的语音数据对应多种应用场景,将控制指令与缓存的语音数据进行匹配,可以通过提取关键字匹配,也可以通过字符串匹配,获取与输入的控制指令匹配的语音数据。In the embodiment of the present invention, the control instruction may be a voice control instruction or a signal control instruction input through a remote control or other equipment; the voice data corresponds to a variety of application scenarios, and the control instruction and the buffered voice data The matching can be performed by extracting keyword matching or string matching to obtain voice data matching the input control instruction.
另外,所述控制指令不区分指令的复杂或简单,无论是何种场景的控制指令,都进行与本地缓存的语音数据的匹配。In addition, the control instructions do not distinguish the complexity or simplicity of the instructions. No matter what kind of control instructions are used, the control instructions are matched with the locally buffered voice data.
接收由蓝牙音箱发送的控制指令,进行语音识别后,与移动终端本地缓存的语音数据进行匹配,获取与控制指令匹配的语音数据。After receiving the control instruction sent by the Bluetooth speaker and performing voice recognition, it is matched with the voice data stored locally in the mobile terminal to obtain voice data that matches the control instruction.
步骤304,若匹配成功,则将与所述控制指令匹配的语音数据发送至音箱进行语音数据播放。In step 304, if the matching is successful, the voice data matching the control instruction is sent to a speaker for voice data playback.
在本发明实施例中,在移动终端本地缓存的数据库中,存在与控制指令匹配的语音数据,则将与控制指令匹配的语音数据发送至蓝牙音箱,通过蓝牙音箱的扬声器进行播放;由于音箱可以设置不同的播放音效模式,根据环境可以进行播放音效模式的设定,实现更佳的播放效果。In the embodiment of the present invention, if there is voice data matching the control instruction in the database cached locally on the mobile terminal, the voice data matching the control instruction is sent to the Bluetooth speaker and played through the speaker of the Bluetooth speaker. Set different playback sound effect modes, and you can set the playback sound effect mode according to the environment to achieve better playback effects.
另外,将所述控制指令与所述语音数据进行匹配的同时,上传所述控制指令至服务器;若匹配不成功,则接收服务器反馈的语音数据,将所述反馈的语音数据发送至蓝牙音箱,进行语音数据的播放,并保存当前控制指令对应的应用场景。In addition, while matching the control instruction with the voice data, upload the control instruction to the server; if the matching is unsuccessful, receiving the voice data fed back by the server and sending the feedback voice data to the Bluetooth speaker, Play the voice data and save the application scenario corresponding to the current control instruction.
可选的,在接收并缓存服务器返回的与所述应用场景对应的语音数据之后,还包括:Optionally, after receiving and buffering the voice data corresponding to the application scenario returned by the server, the method further includes:
接收由音箱发送的控制指令,将所述控制指令发送至服务器;Receiving a control instruction sent by a speaker, and sending the control instruction to a server;
若应用场景对应的语音数据与所述控制指令不匹配,则接收服务器反馈的与所述控制指令对应的语音数据;If the voice data corresponding to the application scenario does not match the control instruction, receiving the voice data corresponding to the control instruction fed back by the server;
将与所述控制指令对应的语音数据发送至音箱进行语音数据播放。Sending voice data corresponding to the control instruction to a speaker for voice data playback.
在本发明实施例中,接收蓝牙音箱发送的控制指令,进行识别后,在与本地语音数据匹配的同时,也会上传至服务器,若控制指令与移动终端本地的语音数据匹配成功,则不再接收服务器生成的语音数据,直接将匹配的语音数据发送至蓝牙音箱进行播放;若在移动终端本地缓存中,没有与控制指令匹配的语音数据,则接收由服务器解析生成的语音数据,并发送至蓝牙音箱,进行语音数据播放。In the embodiment of the present invention, after receiving the control instruction sent by the Bluetooth speaker and identifying it, it will also be uploaded to the server when it matches the local voice data. If the control instruction matches the local voice data of the mobile terminal successfully, it will no longer be Receive the voice data generated by the server, and directly send the matched voice data to the Bluetooth speaker for playback; if there is no voice data matching the control instructions in the local cache of the mobile terminal, the voice data generated by the server is received and sent to Bluetooth speaker for voice data playback.
另外,移动终端根据新输入的当前控制指令,保存当前控制指令对应的应用场景信息,通过不断的统计与学习新的应用场景,增加更全面的应用场景。In addition, the mobile terminal saves the application scenario information corresponding to the current control instruction according to the newly inputted current control instruction, and continuously counts and learns new application scenarios to add more comprehensive application scenarios.
图4示出了本发明实施例提供的语音交互控制方法Wi-Fi音箱系统交互流程示例图,参与该交互流程的执行主体包括图1中的Wi-Fi音箱11、服务器12,该交互流程的实现原理与图2和图3所述的每个执行主体侧的实现原理相一致,因此仅简要地描述该交互流程,不赘述:FIG. 4 is a diagram illustrating an example of an interactive flow of a Wi-Fi speaker system of a voice interaction control method according to an embodiment of the present invention. The execution subject participating in the interactive flow includes the Wi-Fi speaker 11 and the server 12 in FIG. 1. The implementation principle is consistent with the implementation principle of each execution subject side described in FIG. 2 and FIG. 3, so this interaction process is only briefly described, and is not described in detail:
1、由Wi-Fi音箱预先将应用场景上传至服务器;1. Wi-Fi speakers upload application scenarios to the server in advance;
2、服务器根据应用场景生成与应用场景对应的语音数据,并将所述语音数据发送至Wi-Fi音箱;2. The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;
3、Wi-Fi音箱接收并缓存所述语音数据;3. The Wi-Fi speaker receives and buffers the voice data;
4、Wi-Fi音箱接收控制指令,并将控制指令与缓存的语音数据进行匹配;4. The Wi-Fi speaker receives the control instructions and matches the control instructions with the buffered voice data;
5、若匹配成功,Wi-Fi音箱播放与所述控制指令匹配的语音数据。5. If the matching is successful, the Wi-Fi speaker plays voice data matching the control instruction.
可选的,Wi-Fi音箱可以接收语音输入或者按键输入或者触摸输入的应用场景。Optionally, the Wi-Fi speaker can receive an application scenario of voice input, key input, or touch input.
图5示出了本发明实施例提供的语音交互控制方法蓝牙音箱系统交互流程实例图;参与该交互流程的执行主体包括图1中的蓝牙音箱21、移动终端22、服务器12,该交互流程的实现原理与图2和图3所述的每个执行主体侧的实现原理相一致,因此仅简要地描述该交互流程,不赘述:FIG. 5 shows an example flowchart of a Bluetooth speaker system interaction process of a voice interaction control method provided by an embodiment of the present invention; the execution subject participating in the interaction process includes the Bluetooth speaker 21, the mobile terminal 22, and the server 12 in FIG. The implementation principle is consistent with the implementation principle of each execution subject side described in FIG. 2 and FIG. 3, so this interaction process is only briefly described, and is not described in detail:
1、由移动终端预先将应用场景上传至服务器;1. The application scenario is uploaded to the server in advance by the mobile terminal;
2、服务器根据所述应用场景生成与应用场景对应的语音数据,并发送至移动终端;2. The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the mobile terminal;
3、移动终端接收并缓存所述语音数据;3. The mobile terminal receives and buffers the voice data;
4、蓝牙音箱接收控制指令并将控制指令发送至移动终端;4. The Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal;
5、移动终端根据控制指令与缓存的语音数据进行匹配;5. The mobile terminal matches the buffered voice data according to the control instruction;
6、若匹配成功,移动终端将与所述控制指令匹配的语音数据发送至蓝牙音箱;6. If the matching is successful, the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;
7、蓝牙音箱播放语音数据。7. Bluetooth speaker plays voice data.
可选的,所述的语音交互控制的方法还包括:Optionally, the method for controlling voice interaction further includes:
移动终端将控制指令发送至服务器;The mobile terminal sends the control instruction to the server;
服务器根据控制指令生成相应的语音数据,并将语音数据发送至移动终端;The server generates corresponding voice data according to the control instruction, and sends the voice data to the mobile terminal;
移动终端接收服务器反馈的语音数据,并将语音数据发送至蓝牙音箱。The mobile terminal receives the voice data fed back by the server and sends the voice data to the Bluetooth speaker.
需要说明的是,本领域技术人员在本发明揭露的技术范围内,可容易想到的其他排序方案也应在本发明的保护范围之内,在此不一一赘述。It should be noted that other sorting schemes that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should also fall within the protection scope of the present invention, and are not described in detail here.
通过本发明实施例,在通过音箱进行语音交互时,将应用场景提前上传服务器并缓存与应用场景对应的语音数据,在接收到控制指令时,将控制指令与缓存的语音数据进行匹配,匹配成功则直接进行语音数据的播放,减少了语音交互过程中的网络延时,提高了语音交互的响应速率。According to the embodiment of the present invention, when voice interaction is performed through a speaker, the application scenario is uploaded to the server in advance and the voice data corresponding to the application scenario is buffered. When a control instruction is received, the control instruction is matched with the buffered voice data, and the matching is successful Then the voice data is directly played, which reduces the network delay during the voice interaction and improves the response rate of the voice interaction.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.
参照图6,是本发明实施例提供的音箱语音交互控制的装置的示意图,在Wi-Fi音箱,所述装置包括:Referring to FIG. 6, which is a schematic diagram of a device for voice interactive control of a speaker provided in an embodiment of the present invention. In a Wi-Fi speaker, the device includes:
第一识别模块61,用于预先设置应用场景并将所述应用场景上传至服务器;A first identification module 61, configured to preset an application scenario and upload the application scenario to a server;
第一数据库62,用于接收并缓存服务器返回的与所述应用场景对应的语音数据;A first database 62, configured to receive and cache voice data corresponding to the application scenario returned by the server;
第二识别模块63,用于接收控制指令,并将所述控制指令与所述语音数据进行匹配;A second identification module 63, configured to receive a control instruction and match the control instruction with the voice data;
播放模块64,用于若匹配成功,则播放与所述控制指令匹配的语音数据。The playing module 64 is configured to play voice data matching the control instruction if the matching is successful.
进一步的,本发明实施例提供了一种移动终端,所述移动终端包括:Further, an embodiment of the present invention provides a mobile terminal, where the mobile terminal includes:
第三识别模块,用于预先设置应用场景并将所述应用场景上传至服务器;A third identification module, configured to preset an application scenario and upload the application scenario to a server;
第二数据库,用于接收并缓存服务器返回的与所述应用场景对应的语音数据;A second database for receiving and buffering voice data corresponding to the application scenario returned by the server;
第四识别模块,用于接收由音箱发送的控制指令,并将所述控制指令与所述语音数据进行匹配;A fourth identification module, configured to receive a control instruction sent by the speaker, and match the control instruction with the voice data;
发送模块,用于若匹配成功,则将与所述控制指令匹配的语音数据发送至音箱进行语音数据播放。The sending module is configured to: if the matching is successful, send the voice data matching the control instruction to a speaker for voice data playback.
进一步的,本发明实施例提供了一种音箱语音交互控制系统,包括:Wi-Fi音箱、服务器;其中,所述Wi-Fi音箱,用于预先将应用场景上传至服务器,接收并缓存与应用场景对应的语音数据,接收控制指令,并将控制指令与缓存的语音数据进行匹配,匹配成功,则播放与控制指令匹配的语音数据;Further, an embodiment of the present invention provides a speaker voice interactive control system, including: a Wi-Fi speaker and a server; wherein the Wi-Fi speaker is used to upload an application scenario to the server in advance, receive and cache and apply The voice data corresponding to the scene receives the control instructions and matches the control instructions with the buffered voice data. If the matching is successful, the voice data matching the control instructions is played;
所述服务器,用于对应用场景进行解析生成对应的语音数据。The server is configured to analyze the application scenario and generate corresponding voice data.
可选的,本发明实施例还提供了一种音箱语音交互控制系统,包括:蓝牙音箱、移动终端和服务器;其中,所述蓝牙音箱,用于接收控制指令并将控制指令发送至移动终端,以及接收移动终端发送的语音数据,进行语音数据的播放;Optionally, an embodiment of the present invention further provides a speaker voice interactive control system, including: a Bluetooth speaker, a mobile terminal, and a server; wherein the Bluetooth speaker is used to receive a control instruction and send the control instruction to the mobile terminal, And receiving voice data sent by the mobile terminal, and playing the voice data;
所述移动终端,用于将应用场景上传至服务器,接收并缓存与应用场景对应的语音数据,接收蓝牙音箱发送的控制指令,并将控制指令与缓存的语音数据进行匹配,匹配成功,则将与控制指令匹配的语音数据发送至蓝牙音箱;The mobile terminal is used for uploading an application scenario to a server, receiving and buffering voice data corresponding to the application scenario, receiving a control instruction sent by a Bluetooth speaker, and matching the control instruction with the buffered voice data. The voice data matching the control instruction is sent to the Bluetooth speaker;
所述服务器,用于对应用场景进行解析生成对应的语音数据并发送至移动终端。The server is configured to analyze an application scenario, generate corresponding voice data, and send the voice data to a mobile terminal.
本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现音箱语音交互控制方法的步骤。An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program implements steps of a speaker voice interactive control method when the computer program is executed by a processor.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述移动终端的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述移动终端中模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional modules is used as an example. In practical applications, the above functions can be allocated by different functional units and modules as required. That is, the internal structure of the mobile terminal is divided into different functional units or modules to complete all or part of the functions described above. Each functional module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the integrated unit may be implemented in the form of hardware. , Can also be implemented in the form of software functional units. In addition, the specific names of the functional modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the module in the foregoing mobile terminal, reference may be made to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.
在本发明所提供的实施例中,应该理解到,所揭露的装置/终端设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed apparatus / terminal device and method may be implemented in other ways. For example, the device / terminal device embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, such as multiple units. Or components can be combined or integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括是电载波信号和电信信号。When the integrated module / unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on such an understanding, the present invention implements all or part of the processes in the methods of the above embodiments, and may also be completed by a computer program instructing related hardware. The computer program may be stored in a computer-readable storage medium. The computer When the program is executed by a processor, the steps of the foregoing method embodiments can be implemented. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdictions. For example, in some jurisdictions, the computer-readable medium Excludes electric carrier signals and telecommunication signals.
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, but not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in Within the scope of the present invention.

Claims (13)

  1. 一种音箱语音交互控制的方法,其特征在于,包括:A method for interactive voice control of a speaker, comprising:
    预先设置应用场景并将所述应用场景上传至服务器;Preset application scenarios and upload the application scenarios to the server;
    接收并缓存服务器返回的与所述应用场景对应的语音数据;Receiving and buffering voice data corresponding to the application scenario returned by the server;
    接收控制指令,并将所述控制指令与所述语音数据进行匹配;Receiving a control instruction and matching the control instruction with the voice data;
    若匹配成功,则播放与所述控制指令匹配的语音数据。If the matching is successful, the voice data matching the control instruction is played.
  2. 如权利要求1所述的音箱语音交互控制的方法,其特征在于,所述预先设置应用场景并将所述应用场景上传至服务器,包括:The method for interactive voice control of a speaker according to claim 1, wherein the preset application scenario and uploading the application scenario to a server comprise:
    接收触摸输入的应用场景并上传至服务器;Receive touch input application scenarios and upload them to the server;
    和/或and / or
    接收语音输入的应用场景并上传至服务器;Receive application scenarios for voice input and upload to the server;
    和/或and / or
    接收按键输入的应用场景并上传至服务器。Receive application scenarios for key input and upload to the server.
  3. 如权利要求1所述的音箱语音交互控制的方法,其特征在于,所述接收控制指令,并将所述控制指令与所述语音数据进行匹配,包括:The method for interactive voice control of a speaker according to claim 1, wherein receiving the control instruction and matching the control instruction with the voice data comprises:
    将所述控制指令与所述语音数据进行匹配的同时,上传所述控制指令至服务器;Uploading the control instruction to the server while matching the control instruction with the voice data;
    若匹配不成功,则接收服务器反馈的语音数据,播放反馈的语音数据。If the match is unsuccessful, the voice data received from the server is received, and the feedback voice data is played.
  4. 一种音箱语音交互控制的方法,其特征在于,包括:A method for interactive voice control of a speaker, comprising:
    预先设置应用场景并将所述应用场景上传至服务器;Preset application scenarios and upload the application scenarios to the server;
    接收并缓存服务器返回的与所述应用场景对应的语音数据;Receiving and buffering voice data corresponding to the application scenario returned by the server;
    接收由音箱发送的控制指令,并将所述控制指令与所述语音数据进行匹配;Receiving a control instruction sent by a speaker, and matching the control instruction with the voice data;
    若匹配成功,则将与所述控制指令匹配的语音数据发送至音箱进行语音数据播放。If the matching is successful, the voice data matching the control instruction is sent to a speaker for voice data playback.
  5. 如权利要求4所述的音箱语音交互控制的方法,其特征在于,在接收并缓存服务器返回的与所述应用场景对应的语音数据之后,还包括:The method for interactive voice control of a speaker according to claim 4, after receiving and buffering voice data corresponding to the application scenario returned by the server, further comprising:
    接收由音箱发送的控制指令,将所述控制指令发送至服务器;Receiving a control instruction sent by a speaker, and sending the control instruction to a server;
    若应用场景对应的语音数据与所述控制指令不匹配,则接收服务器反馈的与所述控制指令对应的语音数据;If the voice data corresponding to the application scenario does not match the control instruction, receiving the voice data corresponding to the control instruction fed back by the server;
    将与所述控制指令对应的语音数据发送至音箱进行语音数据播放。Sending voice data corresponding to the control instruction to a speaker for voice data playback.
  6. 一种音箱语音交互控制的方法,其特征在于,包括:A method for interactive voice control of a speaker, comprising:
    由Wi-Fi音箱预先将应用场景上传至服务器;Wi-Fi speakers upload application scenarios to the server in advance;
    服务器根据应用场景生成与应用场景对应的语音数据,并将所述语音数据发送至Wi-Fi音箱;The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;
    Wi-Fi音箱接收并缓存所述语音数据;Wi-Fi speakers receive and buffer the voice data;
    Wi-Fi音箱接收控制指令,并将控制指令与缓存的语音数据进行匹配;Wi-Fi speakers receive control instructions and match the control instructions with buffered voice data;
    若匹配成功,Wi-Fi音箱播放与所述控制指令匹配的语音数据。If the matching is successful, the Wi-Fi speaker plays voice data matching the control instruction.
  7. 如权利要求6所述的音箱语音交互控制的方法,其特征在于,还包括:The method for interactive voice control of a speaker according to claim 6, further comprising:
    接收语音输入或者按键输入或者触摸输入的应用场景。Application scenarios that receive voice input, key input, or touch input.
  8. 一种音箱语音交互控制的方法,其特征在于,包括:A method for interactive voice control of a speaker, comprising:
    由移动终端预先将应用场景上传至服务器;The application scenario is uploaded to the server in advance by the mobile terminal;
    服务器根据所述应用场景生成与应用场景对应的语音数据,并发送至移动终端;The server generates voice data corresponding to the application scenario according to the application scenario and sends it to the mobile terminal;
    移动终端接收并缓存所述语音数据;The mobile terminal receives and buffers the voice data;
    蓝牙音箱接收控制指令并将控制指令发送至移动终端;The Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal;
    移动终端根据控制指令与缓存的语音数据进行匹配;The mobile terminal matches the buffered voice data according to the control instruction;
    若匹配成功,移动终端将与所述控制指令匹配的语音数据发送至蓝牙音箱;If the matching is successful, the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;
    蓝牙音箱播放语音数据。Bluetooth speakers play voice data.
  9. 如权利要求8所述的语音交互控制的方法,其特征在于,还包括:The method of voice interaction control according to claim 8, further comprising:
    移动终端将控制指令发送至服务器;The mobile terminal sends the control instruction to the server;
    服务器根据控制指令生成相应的语音数据,并将语音数据发送至移动终端;The server generates corresponding voice data according to the control instruction, and sends the voice data to the mobile terminal;
    移动终端接收服务器反馈的语音数据,并将语音数据发送至蓝牙音箱。The mobile terminal receives the voice data fed back by the server and sends the voice data to the Bluetooth speaker.
  10. 一种音箱语音交互控制的装置,其特征在于,包括:A device for voice interactive control of a speaker, which is characterized by comprising:
    第一识别模块,用于预先设置应用场景并将所述应用场景上传至服务器;A first identification module, configured to preset an application scenario and upload the application scenario to a server;
    第一数据库,用于接收并缓存服务器返回的与所述应用场景对应的语音数据;A first database for receiving and buffering voice data corresponding to the application scenario returned by the server;
    第二识别模块,用于接收控制指令,并将所述控制指令与所述语音数据进行匹配;A second identification module, configured to receive a control instruction and match the control instruction with the voice data;
    播放模块,用于若匹配成功,则播放与所述控制指令匹配的语音数据。The playing module is configured to play voice data matching the control instruction if the matching is successful.
  11. 一种移动终端,其特征在于,包括:A mobile terminal, comprising:
    第三识别模块,用于预先设置应用场景并将所述应用场景上传至服务器;A third identification module, configured to preset an application scenario and upload the application scenario to a server;
    第二数据库,用于接收并缓存服务器返回的与所述应用场景对应的语音数据;A second database for receiving and buffering voice data corresponding to the application scenario returned by the server;
    第四识别模块,用于接收由音箱发送的控制指令,并将所述控制指令与所述语音数据进行匹配;A fourth identification module, configured to receive a control instruction sent by the speaker, and match the control instruction with the voice data;
    发送模块,用于若匹配成功,则将与所述控制指令匹配的语音数据发送至音箱进行语音数据播放。The sending module is configured to: if the matching is successful, send the voice data matching the control instruction to a speaker for voice data playback.
  12. 一种音箱语音交互控制系统,其特征在于,所述系统包括:Wi-Fi音箱、服务器;A speaker voice interactive control system, characterized in that the system includes: a Wi-Fi speaker and a server;
    其中,所述Wi-Fi音箱,用于预先将应用场景上传至服务器,接收并缓存与应用场景对应的语音数据,接收控制指令,并将控制指令与缓存的语音数据进行匹配,匹配成功,则播放与控制指令匹配的语音数据;The Wi-Fi speaker is used to upload an application scenario to a server in advance, receive and buffer voice data corresponding to the application scenario, receive control instructions, and match the control instruction with the buffered voice data. Play the voice data matching the control instructions;
    所述服务器,用于对应用场景进行解析生成对应的语音数据。The server is configured to analyze the application scenario and generate corresponding voice data.
  13. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至9任一项所述方法的步骤。A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 9 are implemented.
PCT/CN2019/084834 2018-09-28 2019-04-28 Voice interactive control method and device for speaker WO2020062862A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811136680.4 2018-09-28
CN201811136680.4A CN110970032A (en) 2018-09-28 2018-09-28 Sound box voice interaction control method and device

Publications (1)

Publication Number Publication Date
WO2020062862A1 true WO2020062862A1 (en) 2020-04-02

Family

ID=69950965

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/084834 WO2020062862A1 (en) 2018-09-28 2019-04-28 Voice interactive control method and device for speaker

Country Status (2)

Country Link
CN (1) CN110970032A (en)
WO (1) WO2020062862A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110739009A (en) * 2019-09-20 2020-01-31 深圳震有科技股份有限公司 Method and device for playing announcement sound by media resource board, computer equipment and storage medium
CN113421542A (en) * 2021-06-22 2021-09-21 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617797A (en) * 2013-12-09 2014-03-05 腾讯科技(深圳)有限公司 Voice processing method and device
CN104516709A (en) * 2014-11-12 2015-04-15 科大讯飞股份有限公司 Voice assisting method and system based on software operation scene and voice assistant
CN105355201A (en) * 2015-11-27 2016-02-24 百度在线网络技术(北京)有限公司 Scene-based voice service processing method and device and terminal device
US9412361B1 (en) * 2014-09-30 2016-08-09 Amazon Technologies, Inc. Configuring system operation using image data
CN106683677A (en) * 2015-11-06 2017-05-17 阿里巴巴集团控股有限公司 Method and device for recognizing voice
CN107026940A (en) * 2017-05-18 2017-08-08 北京神州泰岳软件股份有限公司 A kind of method and apparatus for determining session feedback information
CN107507615A (en) * 2017-08-29 2017-12-22 百度在线网络技术(北京)有限公司 Interface intelligent interaction control method, device, system and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103280217B (en) * 2013-05-02 2016-05-04 锤子科技(北京)有限公司 A kind of audio recognition method of mobile terminal and device thereof
CN103247291B (en) * 2013-05-07 2016-01-13 华为终端有限公司 A kind of update method of speech recognition apparatus, Apparatus and system
CN103440867B (en) * 2013-08-02 2016-08-10 科大讯飞股份有限公司 Audio recognition method and system
CN105261366B (en) * 2015-08-31 2016-11-09 努比亚技术有限公司 Audio recognition method, speech engine and terminal
CN105551494A (en) * 2015-12-11 2016-05-04 奇瑞汽车股份有限公司 Mobile phone interconnection-based vehicle-mounted speech recognition system and recognition method
CN105721725A (en) * 2016-02-03 2016-06-29 北京光年无限科技有限公司 Customer service oriented question and answer interaction method and system
US9922649B1 (en) * 2016-08-24 2018-03-20 Jpmorgan Chase Bank, N.A. System and method for customer interaction management
CN107102982A (en) * 2017-04-10 2017-08-29 江苏东方金钰智能机器人有限公司 The high in the clouds semantic understanding system and its operation method of robot
CN107301168A (en) * 2017-06-01 2017-10-27 深圳市朗空亿科科技有限公司 Intelligent robot and its mood exchange method, system
CN107146622B (en) * 2017-06-16 2021-02-19 合肥美的智能科技有限公司 Refrigerator, voice interaction system, method, computer device and readable storage medium
CN108415683A (en) * 2018-03-07 2018-08-17 深圳车盒子科技有限公司 More scene voice householder methods, intelligent voice system, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617797A (en) * 2013-12-09 2014-03-05 腾讯科技(深圳)有限公司 Voice processing method and device
US9412361B1 (en) * 2014-09-30 2016-08-09 Amazon Technologies, Inc. Configuring system operation using image data
CN104516709A (en) * 2014-11-12 2015-04-15 科大讯飞股份有限公司 Voice assisting method and system based on software operation scene and voice assistant
CN106683677A (en) * 2015-11-06 2017-05-17 阿里巴巴集团控股有限公司 Method and device for recognizing voice
CN105355201A (en) * 2015-11-27 2016-02-24 百度在线网络技术(北京)有限公司 Scene-based voice service processing method and device and terminal device
CN107026940A (en) * 2017-05-18 2017-08-08 北京神州泰岳软件股份有限公司 A kind of method and apparatus for determining session feedback information
CN107507615A (en) * 2017-08-29 2017-12-22 百度在线网络技术(北京)有限公司 Interface intelligent interaction control method, device, system and storage medium

Also Published As

Publication number Publication date
CN110970032A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
US20230046580A1 (en) Methods, systems, and media for controlling audio output
WO2021129262A1 (en) Server-side processing method and server for actively initiating conversation, and voice interaction system capable of actively initiating conversation
CN113613046B (en) Managing playback groups
CN111177453B (en) Method, apparatus, device and computer readable storage medium for controlling audio playing
CN107682752B (en) Method, device and system for displaying video picture, terminal equipment and storage medium
CN104994401A (en) Barrage processing method, device and system
CN103137128A (en) Gesture and voice recognition for control of a device
CN112272170B (en) Voice communication method and device, electronic equipment and storage medium
CN109448709A (en) A kind of terminal throws the control method and terminal of screen
US20220291897A1 (en) Method and device for playing voice, electronic device, and storage medium
WO2020082705A1 (en) Bluetooth speaker playing control method and system, and smart device
US20110276155A1 (en) Media playback settings for playlists
CN104021148A (en) Method and device for adjusting sound effect
CN104301782A (en) Method and device for outputting audios and terminal
WO2022033452A1 (en) Volume recommendation method and apparatus, device and storage medium
WO2020062862A1 (en) Voice interactive control method and device for speaker
US20160050244A1 (en) Systems and Methods for Shared Media Streaming
CN105516451A (en) Sound effect adjustment method and device
CN110086941B (en) Voice playing method and device and terminal equipment
CN103686540A (en) Active wireless network sound equipment and use method thereof
WO2020082710A1 (en) Voice interaction control method, apparatus and system for bluetooth speaker
CN110139164A (en) A kind of voice remark playback method, device, terminal device and storage medium
WO2020062861A1 (en) Voice playback control method and device for bluetooth speaker
CN111836090B (en) Control method, device, equipment and storage medium
WO2020082709A1 (en) Playback control method and system for speakers, and smart device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19867235

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19867235

Country of ref document: EP

Kind code of ref document: A1