WO2020133764A1 - 语音遥控方法、系统、受控装置及计算机可读存储介质 - Google Patents

语音遥控方法、系统、受控装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2020133764A1
WO2020133764A1 PCT/CN2019/079991 CN2019079991W WO2020133764A1 WO 2020133764 A1 WO2020133764 A1 WO 2020133764A1 CN 2019079991 W CN2019079991 W CN 2019079991W WO 2020133764 A1 WO2020133764 A1 WO 2020133764A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
remote control
cloud server
control command
control terminal
Prior art date
Application number
PCT/CN2019/079991
Other languages
English (en)
French (fr)
Inventor
伍以文
许辉福
袁建强
Original Assignee
深圳创维-Rgb电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳创维-Rgb电子有限公司 filed Critical 深圳创维-Rgb电子有限公司
Publication of WO2020133764A1 publication Critical patent/WO2020133764A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of intelligent remote control technology, in particular to a voice remote control method, system, controlled device and computer-readable storage medium.
  • the main purpose of the present application is to provide a voice remote control method, system, controlled device, and computer-readable storage medium, aiming to remotely control the controlled device through voice control commands, which solves the problems of complex mechanical remote control operation and slow response speed.
  • the voice remote control method includes the following steps:
  • the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction
  • the control command text is processed by the cloud server according to the second audio data.
  • the method before the step of sending the second audio data to the cloud server, the method further includes:
  • the step of processing the first audio data according to a preset rule to obtain second audio data includes:
  • the first audio data is optimized based on the obtained audio optimization standard, and the optimized first audio data is used as the second audio data.
  • the method before the step of receiving the first audio data sent by the remote control terminal, the method further includes:
  • the method before the step of sending the second audio data to the cloud server, the method further includes:
  • the step of receiving the first audio data sent by the remote control terminal includes:
  • the present application also provides a voice remote control system, which includes a remote control terminal, a controlled device, and a cloud server;
  • the remote control terminal is used to obtain a user's voice remote control instruction based on a preset condition, and is also used to perform analog-to-digital conversion processing on the voice remote control instruction to obtain first audio data and send the first audio data to the Controlled device
  • the controlled device is configured to process the first audio data according to a preset rule after receiving the first audio data sent by the remote control terminal, obtain second audio data, and send the first Two audio data to the cloud server;
  • the cloud server is configured to, after receiving the second audio data, identify the second audio data according to a preset identification rule and generate control command text, and send the control command text to the controlled device ;
  • the controlled device is further configured to receive the control command text delivered by the cloud server, parse the control command text to obtain a control command, and execute the control command.
  • the remote control terminal includes:
  • the receiving unit is configured to receive a start recording instruction or a stop recording instruction input by a user, and send the received start recording instruction or the stop recording instruction to:
  • the recording unit is used to detect the user's voice remote control instruction and record the detected voice remote control instruction after receiving the start recording instruction.
  • the recording unit is also used to stop all recording after receiving the stop recording instruction Describe the recording action and save the recorded user's voice remote control instructions;
  • a processing unit configured to perform analog-to-digital conversion processing on the voice remote control instruction to obtain first audio data
  • the sending unit is configured to send the first audio data to the controlled device.
  • the present application also provides a controlled device, the controlled device includes:
  • a receiving module configured to receive first audio data sent by a remote control terminal, and the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction;
  • a processing module configured to process the first audio data according to a preset rule to obtain second audio data
  • An upload module used to send the second audio data to the cloud server
  • An execution module configured to receive control command text issued by the cloud server, parse the control command text to obtain a control command, and execute the control command; the control command text is processed by the cloud server according to the second audio data Get processed.
  • the present application also provides a computer-readable storage medium on which computer-readable instructions for voice remote control are stored, which are implemented when the computer-readable instructions for voice remote control are executed by a processor The following steps:
  • the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction
  • the control command text is processed by the cloud server according to the second audio data.
  • This application receives the first audio data sent by the remote control terminal, and the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction; the first audio data is processed according to a preset rule to obtain the first Two audio data; send the second audio data to the cloud server;
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a voice remote control method of the present application
  • FIG. 3 is a schematic flowchart of a second embodiment of a voice remote control method of the present application.
  • FIG. 4 is a schematic flowchart of a third embodiment of a voice remote control method of the present application.
  • FIG. 5 is a schematic flowchart of a fourth embodiment of a voice remote control method of the present application.
  • FIG. 6 is a schematic flowchart of a fifth embodiment of a voice remote control method of the present application.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present application.
  • FIG. 1 is a schematic diagram of the hardware operating environment of the voice remote control device.
  • the voice remote control device in the embodiment of the present application may be a terminal device such as a PC or a portable computer.
  • the voice remote control device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage.
  • the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • the structure of the voice remote control device shown in FIG. 1 does not constitute a limitation on the voice remote control device, and may include more or fewer components than the illustration, or a combination of certain components, or different components Layout.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a voice remote control program.
  • the operating system is a program that manages and controls the hardware and software resources of the voice remote control device, and supports the operation of the voice remote control program and other software or programs.
  • the user interface 1003 is mainly used for data communication with various terminals;
  • the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server;
  • the processor 1001 can be used to call a memory Voice remote control program stored in 1005, and perform the following operations:
  • the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction
  • the control command text is processed by the cloud server according to the second audio data.
  • FIG. 2 is a schematic flowchart of a first embodiment of a voice remote control method of the present application.
  • the voice remote control method of the embodiment of the present application is applied to a controlled device.
  • the controlled device of the embodiment of the present application may be a terminal device such as a set-top box of a smart TV or a digital TV, which is not specifically limited herein.
  • Step S100 Receive first audio data sent by a remote control terminal; wherein, the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction;
  • the remote control terminal has a built-in microphone input module.
  • the microphone input module obtains the user's voice remote control instruction, it passes the MCU (Microcontroller) built in the remote control terminal.
  • the micro control unit processes the acquired user's voice remote command, which is an analog signal.
  • the processing operation can be to identify keywords, extract the voice command trunk, sample the extracted voice command trunk, PDM ( Pulse Density Modulation (Pulse Density Modulation) modulation, MCU coding, etc., thereby converting analog voice remote control commands into digital signals to form DMA (Direct Memory Access (direct memory access) data, that is, the first audio data is obtained, and the obtained first audio data is sent to the controlled device.
  • DMA Direct Memory Access
  • the controlled device receives the first audio data sent by the remote control terminal.
  • the remote control terminal establishes a wireless connection with the controlled device, such as a Bluetooth connection. Based on the wireless connection established by both parties, the remote control terminal transmits the first audio data To the controlled device.
  • Step S200 processing the first audio data according to a preset rule to obtain second audio data
  • the controlled device processes the first audio data.
  • the first audio data may be processed by Alsa (Advanced Linux Sound Architecture, advanced Linux sound architecture) generates PCM files with noise reduction, and then uploads the generated PCM files, that is, the second audio data, to the cloud server, and the second audio data is the processed recording file stream.
  • Alsa Advanced Linux Sound Architecture, advanced Linux sound architecture
  • Step S300 Send the second audio data to the cloud server
  • the websocket mechanism is used to upload the second audio data.
  • the controlled device sends a file transfer request to the cloud server by creating a Socket connection socket.
  • the cloud server receives the processed data from the controlled device.
  • the text recognition engine server recognizes the second audio data, generates a text recognition stream or control command text according to the recognition result, and sends the generated recognition command text to the controlled device.
  • the websocket mechanism used in this embodiment can prevent the controlled device from sending data to the cloud through an HTTP request.
  • the HTTP client needs to synchronize with the server and wait, which causes large network overhead.
  • the controlled device Data transmission will face many problems, such as in the case of unstable network, if the data transmission is guaranteed to be no problem, how to ensure that the data is not sent repeatedly, how to reconnect after the connection is disconnected, etc., the controlled device in this embodiment
  • the cloud server establishes a connection based on the websocket mechanism, which avoids the above-mentioned problems of HTTP transmission.
  • Step S400 Receive control command text issued by the cloud server, parse the control command text to obtain a control command, and execute the control command; wherein, the control command text is processed by the cloud server according to the second audio data Get processed.
  • the controlled device parses the control command text to obtain the control command, and executes the control command.
  • the control command is a command to adjust the volume
  • the analysis As a result, the controlled device calls the TV system API to perform the operation of adjusting the volume after receiving the command text, similar to the power switch On/Off, mute mute, change channel, open application Open YouTube, etc.
  • control command text since the control command text includes character strings and numeric values, the control command text in this embodiment uses JSON (JavaScript Object Notation, JS Object Notation) format expression, it can be understood that, in other embodiments, the control command text may also be in other expression forms, which is not specifically limited herein.
  • JSON JavaScript Object Notation, JS Object Notation
  • the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction; processing the first audio data according to a preset rule to obtain Second audio data; send the second audio data to the cloud server; receive the control command text issued by the cloud server, parse the control command text to obtain the control command, execute the control command; the control command text is
  • the cloud server is processed according to the second audio data; thus, the user's voice remote control instruction is sampled and encoded through the remote control terminal, or digitally processed by other data processing methods to form DMA data, that is, the first audio data, and then the An audio data is transmitted to the controlled device through the established wireless connection.
  • the controlled device processes the first audio data again.
  • Alsa noise reduction process generates a pcm file, that is, second audio data, and the controlled device transfers the second audio data.
  • Upload to the cloud server through the websocket mechanism the cloud server performs text recognition on the second audio data, and sends the recognized control command text to the control device.
  • the control command text can be in JSON format
  • the controlled device receives the received JSON text Analyze and execute the corresponding action, which effectively solves the problem that when users use traditional mechanical remote controllers, they need to click the remote controller multiple times to open the multi-level menu, one by one in the program list to find the program to watch or to find the button that needs to be adjusted.
  • the search operation is very cumbersome, and the adjustment response cannot meet the requirements of real-time response.
  • the controlled device directly performs the operation based on the user's voice remote control instruction, thereby greatly improving the user's convenience of operation, especially The convenience of operation for elderly users and users with limited mobility also meets the real-time response needs of adjustment operations.
  • FIG. 3 is a schematic flowchart of a second embodiment of a voice remote control method according to the present application. Based on the first embodiment of the voice remote control method described above, in this embodiment, step S300, the step of sending the second audio data to a cloud server It also includes:
  • Step S201 create a Socket connection and send a connection request to the cloud server
  • Step S202 Receive a response of the cloud server to the connection request, and establish a Socket connection with the cloud server.
  • the disadvantage of data transmission based on the HTTP protocol is that the HTTP client needs to synchronize with the server and wait, which requires a large network overhead for the device, and the data transmission of the smart device will face many problems, such as the situation of unstable network Next, if there is no problem in ensuring the data transmission, how to ensure that the data is not sent repeatedly, and how to reconnect after the connection is disconnected, HTTP cannot solve this type of problem.
  • the websocket mechanism is used to upload the recording file, that is, the second audio data.
  • the controlled device sends a file transfer request to the cloud server by creating a Socket connection socket.
  • the cloud server receives the recording file and recognizes it as text.
  • the cloud server establishes a server-side socket monitoring request, and the controlled device establishes a connection with the cloud server; the controlled device sends the recording file stream, that is, second audio data, to the cloud server
  • the text recognition engine server recognizes the recording file stream as text to form a text recognition stream, that is, control command text
  • the cloud server sends the text recognition stream to the controlled device
  • the controlled device receives the cloud server to deliver Control command text, parse the control command text to get the control command, execute the control command, and close the Socket connection to release resources.
  • This embodiment uses the websocket mechanism, which can prevent the controlled device from sending data to the cloud through an HTTP request.
  • the HTTP client needs to synchronize with the server and wait, resulting in a large network overhead, and the controlled device will face a lot of data transmission. Problems, such as in the case of unstable networks, if there is no problem in ensuring data transmission, how to ensure that data is not sent repeatedly, how to reconnect after the connection is disconnected, etc.
  • the controlled device and the cloud server are based on the websocket mechanism The establishment of the connection avoids the problems of the above HTTP transmission.
  • FIG. 4 is a schematic flowchart of a third embodiment of a voice remote control method according to the present application. Based on the second embodiment of the voice remote control method, in this embodiment, step S200 is performed on the first audio data according to a preset rule The step of processing to obtain the second audio data includes:
  • Step S210 acquiring preset audio optimization standards
  • Step S220 Optimize the first audio data based on the obtained audio optimization standard, and use the optimized first audio data as second audio data.
  • the main chip of the controlled device processes the first audio data.
  • the microphone input built in the remote control terminal The module collects the user's voice remote control instructions, at the same time collects the current scene's environmental noise parameters, that is, the first audio data includes the digital user's voice remote control instructions and the current scene's environmental noise parameters, after the controlled device receives the first audio data , According to the environmental noise parameters of the current environment included in the first audio data, a preset reverse phase noise signal matching the environmental noise parameters is retrieved, and the environmental noise parameters of the current environment are cancelled to achieve the reduction of the first audio data Noise processing, upload the first audio data after noise reduction as the second audio data to the cloud server; it can be understood that, in other embodiments, the audio optimization standard may have other implementation manners, and is not limited to this embodiment. The implementation described.
  • a preset audio optimization standard is obtained, the first audio data is optimized based on the obtained audio optimization standard, and the optimized first audio data is used as the second Audio data, create a Socket connection and send a connection request to the cloud server, receive the cloud server's response to the connection request, establish a Socket connection with the cloud server, send the second audio data to the cloud server, receive the The control command text issued by the cloud server, parsing the control command text to obtain the control command, and executing the control command; thereby, while improving the convenience of user operations and meeting the real-time response needs of adjustment operations, the voice control commands are also improved The accuracy of recognition ensures the effectiveness of voice control.
  • FIG. 5 is a schematic flowchart of a fourth embodiment of a voice remote control method according to the present application. Based on the third embodiment of the voice remote control method, in this embodiment, step S100 before the step of receiving the first audio data sent by the remote control terminal also includes:
  • Step S101 Detect whether a preset write instruction is received
  • step S100 If yes, go to step S100 to receive the first audio data sent by the remote control terminal.
  • step S300 before the step of sending the second audio data to the cloud server, further includes:
  • Step S301 detecting whether a preset reading instruction is received
  • step S300 If yes, go to step S300 and send the second audio data to the cloud server.
  • the controlled device uses Alsa (Advanced Linux Sound Architecture, advanced Linux sound architecture) audio driver, Alsa supports Bluetooth sound devices, Alsa's read and write operations are triggered by user setting function call write and read commands, in this embodiment the controlled device detects that a preset is received After the write instruction is received, the first audio data sent by the remote control terminal is received; after detecting that the preset read instruction is received, the second audio data is sent to the cloud server.
  • Alsa Advanced Linux Sound Architecture, advanced Linux sound architecture
  • FIG. 6 is a schematic flowchart of a fifth embodiment of a voice remote control method according to the present application.
  • step S100 the step of receiving first audio data sent by a remote control terminal includes :
  • Step S110 in response to the Bluetooth pairing request sent by the remote control terminal, establish a Bluetooth connection with the remote control terminal;
  • Step S120 based on the Bluetooth connection, receiving first audio data sent by the remote control terminal.
  • the remote control terminal has a built-in first Bluetooth module
  • the controlled device has a built-in second Bluetooth module.
  • the first Bluetooth module establishes a wireless connection with the second Bluetooth module through search, scanning, and pairing. Bluetooth connection, the remote control terminal transmits the original audio data queue Bluetooth to the controlled device, that is, the remote control terminal sends the first audio data to the controlled device.
  • the wireless connection between the remote control terminal and the controlled device is not limited to the Bluetooth connection, and may also be other wireless connection methods, which are not specifically limited in this embodiment.
  • the embodiments of the present application also provide a voice remote control system, which includes a remote control terminal, a controlled device, and a cloud server;
  • the remote control terminal is used to obtain a user's voice remote control instruction based on a preset condition, and is also used to perform analog-to-digital conversion processing on the voice remote control instruction to obtain first audio data and send the first audio data to the Controlled device
  • the controlled device is configured to process the first audio data according to a preset rule after receiving the first audio data sent by the remote control terminal, obtain second audio data, and send the first Two audio data to the cloud server;
  • the cloud server is configured to, after receiving the second audio data, identify the second audio data according to a preset identification rule and generate control command text, and send the control command text to the controlled device ;
  • the controlled device is further configured to receive the control command text delivered by the cloud server, parse the control command text to obtain a control command, and execute the control command.
  • the remote control terminal includes:
  • the receiving unit is configured to receive a start recording instruction or a stop recording instruction input by a user, and send the received start recording instruction or the stop recording instruction to:
  • the recording unit is used to detect the user's voice remote control instruction and record the detected voice remote control instruction after receiving the start recording instruction.
  • the recording unit is also used to stop all recording after receiving the stop recording instruction Describe the recording action and save the recorded user's voice remote control instructions;
  • a processing unit configured to perform analog-to-digital conversion processing on the voice remote control instruction to obtain first audio data
  • the sending unit is configured to send the first audio data to the controlled device.
  • the remote control terminal has a physical voice key or touch voice key to trigger the capture of the user's voice remote control command.
  • the voice key is pressed to start recording and the voice is released Press the key to stop recording, so as to only collect relevant data, avoid unnecessary recognition pressure and transmission bandwidth pressure caused by the remote control terminal continuously monitoring the environmental voice commands, and improve the accuracy of voice remote command control.
  • an embodiment of the present application also provides a controlled device, the controlled device includes:
  • a receiving module configured to receive first audio data sent by a remote control terminal, and the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction;
  • a processing module configured to process the first audio data according to a preset rule to obtain second audio data
  • An upload module used to send the second audio data to the cloud server
  • An execution module configured to receive control command text issued by the cloud server, parse the control command text to obtain a control command, and execute the control command; the control command text is processed by the cloud server according to the second audio data Get processed.
  • the device further includes:
  • Create module used to create a Socket connection and send a connection request to the cloud server
  • the connection module is configured to receive a response of the cloud server to the connection request, and establish a Socket connection with the cloud server.
  • the processing module includes:
  • An obtaining unit used to obtain preset audio optimization standards
  • the optimization unit is configured to optimize the first audio data based on the obtained audio optimization standard, and use the optimized first audio data as second audio data.
  • the device further includes:
  • the first detection module is used to detect whether a preset write instruction is received
  • the receiving module is further configured to receive the first audio data sent by the remote control terminal when the detection result of the first detection module is "Yes".
  • the device further includes:
  • the second detection module is used to detect whether a preset reading instruction is received
  • the uploading module is also used to send the second audio data to the cloud server when the detection result of the second detection module is "Yes".
  • the receiving module includes:
  • the pairing unit is configured to establish a Bluetooth connection with the remote control terminal in response to the Bluetooth pairing request sent by the remote control terminal;
  • the audio acquisition unit is configured to receive the first audio data sent by the remote control terminal based on the Bluetooth connection.
  • the embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium, and the storage medium stores voice-readable computer-readable instructions.
  • the computer-readable instructions of the voice remote control are executed by the processor, the following steps are realized:
  • the first audio data is processed by the remote control terminal according to the obtained user voice remote control instruction
  • the control command text is processed by the cloud server according to the second audio data.
  • the voice remote computer-readable instructions are executed by the processor to implement the following steps:
  • the step of processing the first audio data according to a preset rule to obtain second audio data includes:
  • the first audio data is optimized based on the obtained audio optimization standard, and the optimized first audio data is used as the second audio data.
  • the voice remote computer-readable instructions are executed by the processor to implement the following steps:
  • step of receiving the first audio data sent by the remote control terminal includes:
  • the method implemented when the computer-readable instructions of the voice remote control running on the processor are executed can refer to the embodiments of the voice remote control method of the present application, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Selective Calling Equipment (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音遥控方法、系统、受控装置及计算机可读存储介质,方法包括:接收遥控终端发送的第一音频数据(S100),第一音频数据由遥控终端根据获取到的用户语音遥控指令处理得到;根据预设规则对第一音频数据进行处理,得到第二音频数据(S200);发送第二音频数据至云服务器(S300);接收云服务器下发的控制命令文本,解析控制命令文本得到控制命令,执行控制命令(S400),控制命令文本由云服务器根据第二音频数据处理得到。通过语音控制指令遥控受控装置,解决了传统机械遥控操作复杂、响应速度慢的问题。

Description

语音遥控方法、系统、受控装置及计算机可读存储介质
本申请要求于2018年12月25日提交中国专利局、申请号为201811599357.0、发明名称为“语音遥控方法、系统、受控装置及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及智能遥控技术领域,尤其涉及一种语音遥控方法、系统、受控装置及计算机可读存储介质。
背景技术
目前,大多数消费电子设备在使用时,是由用户操作机械遥控器进行控制的,如用户在观看电视时,需要手动操作遥控器进行搜台、音量调节、节目切换、打开/关闭应用、电视图像/声音参数调节等;然而,用户往往需要多次点击遥控器来打开多级菜单,在节目列表中一个个查找要观看的节目或是查找需要调节参数的按钮,查找操作十分繁琐,给用户的使用带来较大不便,通过机械遥控器调节音量等参数也不能满足实时响应的要求。
发明内容
本申请的主要目的在于提供一种语音遥控方法、系统、受控装置及计算机可读存储介质,旨在通过语音控制指令遥控受控装置,解决了传统机械遥控操作复杂、响应速度慢的问题。
为实现上述目的,本申请提供一种语音遥控方法,应用于受控装置,所述语音遥控方法包括以下步骤:
接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
发送所述第二音频数据至云服务器;
接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
可选地,所述发送所述第二音频数据至云服务器的步骤之前还包括:
创建Socket连接并向云服务器发送连接请求;
接收所述云服务器针对所述连接请求的响应,与所述云服务器建立Socket连接。
可选地,所述根据预设规则对所述第一音频数据进行处理,得到第二音频数据的步骤包括:
获取预设的音频优化标准;
基于获取到的音频优化标准对所述第一音频数据进行优化,将优化后的第一音频数据作为第二音频数据。
可选地,所述接收遥控终端发送的第一音频数据的步骤之前还包括:
检测是否接收到预设的写入指令;
若是,则进入步骤:接收遥控终端发送的第一音频数据。
可选地,所述发送所述第二音频数据至云服务器的步骤之前还包括:
检测是否接收到预设的读取指令;
若是,则进入步骤:发送所述第二音频数据至云服务器。
可选地,所述接收遥控终端发送的第一音频数据的步骤包括:
响应于遥控终端发送的蓝牙配对请求,与所述遥控终端建立蓝牙连接;
基于所述蓝牙连接,接收所述遥控终端发送的第一音频数据。
此外,本申请还提供一种语音遥控系统,所述语音遥控系统包括遥控终端、受控装置及云服务器;
所述遥控终端,用于基于预设条件,获取用户语音遥控指令,还用于对所述语音遥控指令进行模数转换处理,得到第一音频数据,并发送所述第一音频数据至所述受控装置;
所述受控装置,用于在接收到所述遥控终端发送的所述第一音频数据后,根据预设规则对所述第一音频数据进行处理,得到第二音频数据,并发送所述第二音频数据至所述云服务器;
所述云服务器,用于在接收到所述第二音频数据后,根据预设的识别规则,识别所述第二音频数据并生成控制命令文本,发送所述控制命令文本至所述受控装置;
所述受控装置,还用于接收所述云服务器下发的所述控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令。
可选地,所述遥控终端包括:
接收单元,用于接收用户输入的开始录制指令或停止录制指令,并将接收到的所述开始录制指令或所述停止录制指令发送至:
录音单元,用于在接收到所述开始录制指令后,检测用户语音遥控指令并对检测到的语音遥控指令进行录制,所述录音单元还用于在接收到所述停止录制指令后,停止所述录制动作,并保存录制的用户语音遥控指令;
处理单元,用于对所述语音遥控指令进行模数转换处理,得到第一音频数据;
发送单元,用于发送所述第一音频数据至所述受控装置。
此外,为实现上述目的,本申请还提供一种受控装置,所述受控装置包括:
接收模块,用于接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
处理模块,用于根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
上传模块,用于发送所述第二音频数据至云服务器;
执行模块,用于接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有语音遥控计算机可读指令,所述语音遥控计算机可读指令被处理器执行时实现如下步骤:
接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
发送所述第二音频数据至云服务器;
接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
本申请接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;根据预设规则对所述第一音频数据进行处理,得到第二音频数据;发送所述第二音频数据至云服务器;
接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到;由此,有效解决了用户使用传统机械遥控器时,需要多次点击遥控器来打开多级菜单,在节目列表中一个个查找要观看的节目或是查找需要调节参数的按钮,查找操作繁琐,调节响应也不能满足实时响应的要求的问题,采用本申请语音遥控方法,受控装置基于用户的语音遥控指令直接执行操作,从而大大提升了用户的操作便捷性,满足调节操作的实时响应需求。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的结构示意图;
图2为本申请语音遥控方法第一实施例的流程示意图;
图3为本申请语音遥控方法第二实施例的流程示意图;
图4为本申请语音遥控方法第三实施例的流程示意图;
图5为本申请语音遥控方法第四实施例的流程示意图;
图6为本申请语音遥控方法第五实施例的流程示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的结构示意图。
需要说明的是,图1即可为语音遥控设备的硬件运行环境的结构示意图。本申请实施例语音遥控设备可以是PC,便携计算机等终端设备。
如图1所示,该语音遥控设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的语音遥控设备结构并不构成对语音遥控设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及语音遥控程序。其中,操作系统是管理和控制语音遥控设备硬件和软件资源的程序,支持语音遥控程序以及其它软件或程序的运行。
在图1所示的语音遥控设备中,用户接口1003主要用于与各个终端进行数据通信;网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;而处理器1001可以用于调用存储器1005中存储的语音遥控程序,并执行以下操作:
接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
发送所述第二音频数据至云服务器;
接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
基于上述的结构,提出本申请语音遥控方法的各个实施例。
参照图2,图2为本申请语音遥控方法第一实施例的流程示意图。
本申请实施例提供的语音遥控方法的实施例,需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例语音遥控方法应用于受控装置,本申请实施例受控装置可以是智能电视、数字电视的机顶盒等终端设备,在此不做具体限制。
本实施例语音遥控方法包括:
步骤S100,接收遥控终端发送的第一音频数据;其中,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
目前,大多数消费电子设备在使用时,是由用户通过机械遥控器进行控制的,如用户在观看电视时,需要手动操作遥控器进行搜台、音量调节、节目切换、信号源切换、打开/关闭应用、开关机、电视图像/声音参数调节等;但是,用户往往需要多次点击遥控器来打开多级菜单,在节目列表中一个个查找要观看的节目或是查找需要调节参数的按钮,查找操作十分繁琐,给用户的使用带来较大不便,调节响应也不能满足实时响应的要求。
本实施例中,作为一种实施方式,遥控终端内置麦克风输入模块,麦克风输入模块获取到用户语音遥控指令后,通过遥控终端内置的MCU(Microcontroller Uni,微控制单元)对获取到的用户语音遥控指令进行处理,该语音遥控指令为模拟信号,处理操作可以是识别关键字,提取语音指令主干,对提取到的语音指令主干进行采样、PDM(Pulse Density Modulation;脉冲密度调制)调制、MCU编码等,由此将模拟化的语音遥控指令转换为数字信号形成DMA(Direct Memory Access,直接内存存取)数据,即得到第一音频数据,并将得到的第一音频数据发送至受控装置。
本实施例受控装置接收遥控终端发送的第一音频数据,作为一种实施方式,遥控终端与受控装置建立无线连接,如蓝牙连接,基于双方建立的无线连接,遥控终端传输第一音频数据至受控装置。
步骤S200,根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
具体地,受控装置接收到遥控终端发送的第一音频数据后,对第一音频数据进行处理,作为一种实施方式,可以是对第一音频数据通过Alsa(Advanced Linux Sound Architecture,高级Linux声音架构)降噪生成PCM文件,再将生成的PCM文件即所述第二音频数据上传至云服务器,所述第二音频数据为处理后的录音文件流。
步骤S300,发送所述第二音频数据至云服务器;
本实施例中,作为一种实施方式,采用websocket机制实现第二音频数据的上传,受控装置通过创建Socket连接套接字向云服务器发出传输文件请求,云服务器接收到受控装置处理后的录音文件流即所述第二音频数据后,由文本识别引擎服务器识别所述第二音频数据,根据识别结果生成文本识别流即控制命令文本,并将生成的识别命令文本发送至受控装置。
需要说明的是,本实施例采用的websocket机制,可以避免受控装置通过HTTP请求向云端发送数据时,由于HTTP客户端需要和服务器端同步即等待,而造成的网络开销较大,受控装置的数据传输会面临很多问题,比如在网络不稳定的情况下,如果保证数据的传输没有问题,如何保证数据不被重复发送,连接断开后如何进行重连等问题,本实施例受控装置及云服务器基于websocket机制建立连接,很好的避免了上述HTTP传输下存在的问题。
步骤S400,接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;其中,所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
进一步地,受控装置接收到云服务器端的控制命令文本后,解析控制命令文本得到控制指令,并执行控制指令,以受控装置是智能电视为例,如得到的控制指令为调节音量的command解析结果,受控装置接收到该command文本后调用TV系统API执行调节音量的操作,类似还有开关机Power On/Off,静音Mute, 切台Change channel,打开应用Open YouTube等。
本实施例中,由于控制命令文本包含字符串和数值,本实施例控制命令文本采用JSON(JavaScript Object Notation, JS 对象简谱) 格式表达,可以理解的是,在其它实施例中,控制命令文本还可以是其它的表达形式,在此不做具体限制。
本申请通过接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;根据预设规则对所述第一音频数据进行处理,得到第二音频数据;发送所述第二音频数据至云服务器;接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到;由此,通过遥控终端对用户语音遥控指令进行采样编码,或者以其它数据处理方法进行数字化处理形成DMA数据即第一音频数据,再将第一音频数据通过建立的无线连接传输到受控装置,受控装置对第一音频数据进行再次处理,如通过Alsa降噪处理生成pcm文件即第二音频数据,受控装置再将第二音频数据通过websocket机制上传到云端服务器,由云端服务器对第二音频数据进行文本识别,将识别得到的控制命令文本发送给控制装置,控制命令文本可以是JSON格式,最后受控装置对接收到的JSON文本进行解析并执行对应的动作,有效解决了用户使用传统机械遥控器时,需要多次点击遥控器来打开多级菜单,在节目列表中一个个查找要观看的节目或是查找需要调节参数的按钮,查找操作十分繁琐,调节响应也不能满足实时响应的要求的问题,采用本申请语音遥控方法,受控装置基于用户的语音遥控指令直接执行操作,从而大大提升了用户的操作便捷性,尤其是老年用户和行动不便用户的操作便捷性,也满足调节操作的实时响应需求。
进一步地,提出本申请语音遥控方法第二实施例。
参照图3,图3为本申请语音遥控方法第二实施例的流程示意图,基于上述语音遥控方法第一实施例,本实施例中,步骤S300,发送所述第二音频数据至云服务器的步骤之前还包括:
步骤S201,创建Socket连接并向云服务器发送连接请求;
步骤S202,接收所述云服务器针对所述连接请求的响应,与所述云服务器建立Socket连接。
基于HTTP协议的数据发送,其缺陷是HTTP客户端需要和服务器端同步即等待,这对设备来说需要的网络开销较大,智能设备的数据传输会面临很多问题,比如在网络不稳定的情况下,如果保证数据的传输没有问题,如何保证数据不被重复发送,连接断开后如何进行重连,HTTP无法解决这类问题。
本实施例中,采用websocket机制实现录音文件即第二音频数据的上传,受控装置通过创建Socket连接套接字向云服务器发出传输文件请求,云服务器接收录音文件并识别成文本,其具体过程为,受控装置创建Socket连接向云服务器发送请求,云服务器建立服务端Socket监听请求,受控装置与云服务器连接建立;受控装置发送录音文件流即第二音频数据至云服务器,云服务器接收到录音文件流后,由文本识别引擎服务器将录音文件流识别成文本,形成文本识别流即控制命令文本,云服务器发送文本识别流至受控装置,受控装置接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令,Socket连接关闭释放资源。
本实施例采用websocket机制,可以避免受控装置通过HTTP请求向云端发送数据时,由于HTTP客户端需要和服务器端同步即等待,而造成的网络开销较大,受控装置的数据传输会面临很多问题,比如在网络不稳定的情况下,如果保证数据的传输没有问题,如何保证数据不被重复发送,连接断开后如何进行重连等问题,本实施例受控装置及云服务器基于websocket机制建立连接,很好的避免了上述HTTP传输下存在的问题。
进一步地,提出本申请语音遥控方法第三实施例。
参照图4,图4为本申请语音遥控方法第三实施例的流程示意图,基于上述语音遥控方法第二实施例,本实施例中,步骤S200,根据预设规则对所述第一音频数据进行处理,得到第二音频数据的步骤包括:
步骤S210,获取预设的音频优化标准;
步骤S220,基于获取到的音频优化标准对所述第一音频数据进行优化,将优化后的第一音频数据作为第二音频数据。
本实施例中,受控装置接收到遥控终端发送的DMA数据即第一音频数据后,受控装置的主芯片端对第一音频数据进行处理,作为一种实施方式,遥控终端内置的麦克风输入模块采集用户语音遥控指令的同时,一并采集当前场景的环境噪音参数,即第一音频数据中包括数字化的用户语音遥控指令及当前场景的环境噪音参数,受控装置接收到第一音频数据后,根据第一音频数据中包括的当前环境的环境噪音参数调取预置的与所述环境噪音参数匹配的反相噪音信号,对当前环境的环境噪音参数进行抵消,实现第一音频数据的降噪处理,将降噪后的第一音频数据作为第二音频数据上传至云服务器;可以理解的是,在其它实施例中,音频优化标准可以具有其它的实施方式,不局限于本实施例所述的实现方式。
本实施例通过接收遥控终端发送的第一音频数据,获取预设的音频优化标准,基于获取到的音频优化标准对所述第一音频数据进行优化,将优化后的第一音频数据作为第二音频数据,创建Socket连接并向云服务器发送连接请求,接收所述云服务器针对所述连接请求的响应,与所述云服务器建立Socket连接,发送所述第二音频数据至云服务器,接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;由此,在提升用户操作便捷性、满足调节操作的实时响应需求的同时,提升了语音控制命令识别的准确性,确保语音控制的有效性。
进一步地,提出本申请语音遥控方法第四实施例。
参照图5,图5为本申请语音遥控方法第四实施例的流程示意图,基于上述语音遥控方法第三实施例,本实施例中,步骤S100,接收遥控终端发送的第一音频数据的步骤之前还包括:
步骤S101,检测是否接收到预设的写入指令;
若是,则进入步骤S100,接收遥控终端发送的第一音频数据。
进一步地,本实施例中,步骤S300,发送所述第二音频数据至云服务器的步骤之前还包括:
步骤S301,检测是否接收到预设的读取指令;
若是,则进入步骤S300,发送所述第二音频数据至云服务器。
本实施例中,受控装置采用Alsa(Advanced Linux Sound Architecture,高级Linux声音架构)音频驱动,Alsa支持蓝牙声音设备,Alsa的读取和写入操作由用户设置函数调用写入和读取指令去触发,本实施例受控装置检测到收到预设的写入指令后,接收遥控终端发送的第一音频数据;检测到收到预设的读取指令后,发送所述第二音频数据至云服务器。
进一步地,提出本申请语音遥控方法第五实施例。
参照图6,图6为本申请语音遥控方法第五实施例的流程示意图,基于上述语音遥控方法第一实施例,本实施例中,步骤S100,接收遥控终端发送的第一音频数据的步骤包括:
步骤S110,响应于遥控终端发送的蓝牙配对请求,与所述遥控终端建立蓝牙连接;
步骤S120,基于所述蓝牙连接,接收所述遥控终端发送的第一音频数据。
本实施例中,作为一种实施方式,遥控终端内置第一蓝牙模块,受控装置内置第二蓝牙模块,第一蓝牙模块通过搜索、扫描、配对与第二蓝牙模块建立无线连接,基于建立的蓝牙连接,遥控终端将原始音频数据队列蓝牙传输至受控装置,即遥控终端发送第一音频数据至所述受控装置。
需要说明的是,在其它实施例中,遥控终端与受控装置之间的无线连接不限于蓝牙连接,还可以是其它的无线连接方式,本实施例不做具体限制。
此外,本申请实施例还提出一种语音遥控系统,所述语音遥控系统包括遥控终端、受控装置及云服务器;
所述遥控终端,用于基于预设条件,获取用户语音遥控指令,还用于对所述语音遥控指令进行模数转换处理,得到第一音频数据,并发送所述第一音频数据至所述受控装置;
所述受控装置,用于在接收到所述遥控终端发送的所述第一音频数据后,根据预设规则对所述第一音频数据进行处理,得到第二音频数据,并发送所述第二音频数据至所述云服务器;
所述云服务器,用于在接收到所述第二音频数据后,根据预设的识别规则,识别所述第二音频数据并生成控制命令文本,发送所述控制命令文本至所述受控装置;
所述受控装置,还用于接收所述云服务器下发的所述控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令。
优选地,所述遥控终端包括:
接收单元,用于接收用户输入的开始录制指令或停止录制指令,并将接收到的所述开始录制指令或所述停止录制指令发送至:
录音单元,用于在接收到所述开始录制指令后,检测用户语音遥控指令并对检测到的语音遥控指令进行录制,所述录音单元还用于在接收到所述停止录制指令后,停止所述录制动作,并保存录制的用户语音遥控指令;
处理单元,用于对所述语音遥控指令进行模数转换处理,得到第一音频数据;
发送单元,用于发送所述第一音频数据至所述受控装置。
本实施例中,作为一种实施方式,遥控终端具有一个实体语音键或触控语音键来触发用户语音遥控指令的捕捉,用户需要录制语音遥控指令时,按下语音键开始录制,释放该语音键停止录制,从而只采集相关数据,避免了遥控终端持持续监听环境语音命令带来的不必要的识别压力和传输带宽压力,提升了语音遥控指令控制的准确性。
本实施例提出的语音遥控系统各个组件运行时实现如上所述的语音遥控方法的步骤,在此不再赘述。
此外,本申请实施例还提出一种受控装置,所述受控装置包括:
接收模块,用于接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
处理模块,用于根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
上传模块,用于发送所述第二音频数据至云服务器;
执行模块,用于接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
优选地,所述装置还包括:
创建模块,用于创建Socket连接并向云服务器发送连接请求;
连接模块,用于接收所述云服务器针对所述连接请求的响应,与所述云服务器建立Socket连接。
优选地,所述处理模块包括:
获取单元,用于获取预设的音频优化标准;
优化单元,用于基于获取到的音频优化标准对所述第一音频数据进行优化,将优化后的第一音频数据作为第二音频数据。
优选地,所述装置还包括:
第一检测模块,用于检测是否接收到预设的写入指令;
所述接收模块,还用于在所述第一检测模块的检测结果为“是”时,接收遥控终端发送的第一音频数据。
优选地,所述装置还包括:
第二检测模块,用于检测是否接收到预设的读取指令;
所述上传模块,还用于在所述第二检测模块的检测结果为“是”时,发送所述第二音频数据至云服务器。
优选地,所述接收模块包括:
配对单元,用于响应于遥控终端发送的蓝牙配对请求,与所述遥控终端建立蓝牙连接;
音频获取单元,用于基于所述蓝牙连接,接收所述遥控终端发送的第一音频数据。
本实施例提出的语音遥控装置各个模块运行时实现如上所述的语音遥控方法的步骤,在此不再赘述。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以为非易失性可读存储介质,所述存储介质上存储有语音遥控计算机可读指令,所述语音遥控计算机可读指令被处理器执行时实现如下步骤:
接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
发送所述第二音频数据至云服务器;
接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
进一步地,所述发送所述第二音频数据至云服务器的步骤之前,所述语音遥控计算机可读指令被处理器执行时实现如下步骤:
创建Socket连接并向云服务器发送连接请求;接收所述云服务器针对所述连接请求的响应,与所述云服务器建立Socket连接。
进一步地,所述根据预设规则对所述第一音频数据进行处理,得到第二音频数据的步骤包括:
获取预设的音频优化标准;
基于获取到的音频优化标准对所述第一音频数据进行优化,将优化后的第一音频数据作为第二音频数据。
进一步地,所述接收遥控终端发送的第一音频数据的步骤之前,所述语音遥控计算机可读指令被处理器执行时实现如下步骤:
检测是否接收到预设的写入指令;
若是,则进入步骤:接收遥控终端发送的第一音频数据。
进一步地,所述发送所述第二音频数据至云服务器的步骤之前,所述语音遥控计算机可读指令被处理器执行时实现如下步骤:
检测是否接收到预设的读取指令;
若是,则进入步骤:发送所述第二音频数据至云服务器。
进一步地,所述接收遥控终端发送的第一音频数据的步骤包括:
响应于遥控终端发送的蓝牙配对请求,与所述遥控终端建立蓝牙连接;
基于所述蓝牙连接,接收所述遥控终端发送的第一音频数据。
其中,在所述处理器上运行的语音遥控计算机可读指令被执行时所实现的方法可参照本申请语音遥控方法各个实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (15)

  1. 一种语音遥控方法,其中,应用于受控装置,所述语音遥控方法包括以下步骤:
    接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
    根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
    发送所述第二音频数据至云服务器;
    接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
  2. 如权利要求1所述的语音遥控方法,其中,所述发送所述第二音频数据至云服务器的步骤之前还包括:
    创建Socket连接并向云服务器发送连接请求;
    接收所述云服务器针对所述连接请求的响应,与所述云服务器建立Socket连接。
  3. 如权利要求2所述的语音遥控方法,其中,所述根据预设规则对所述第一音频数据进行处理,得到第二音频数据的步骤包括:
    获取预设的音频优化标准;
    基于获取到的音频优化标准对所述第一音频数据进行优化,将优化后的第一音频数据作为第二音频数据。
  4. 如权利要求1所述的语音遥控方法,其中,所述接收遥控终端发送的第一音频数据的步骤之前还包括:
    检测是否接收到预设的写入指令;
    若是,则进入步骤:接收遥控终端发送的第一音频数据。
  5. 如权利要求4所述的语音遥控方法,其特征在于,所述发送所述第二音频数据至云服务器的步骤之前还包括:
    检测是否接收到预设的读取指令;
    若是,则进入步骤:发送所述第二音频数据至云服务器。
  6. 如权利要求1所述的语音遥控方法,其中,所述接收遥控终端发送的第一音频数据的步骤包括:
    响应于遥控终端发送的蓝牙配对请求,与所述遥控终端建立蓝牙连接;
    基于所述蓝牙连接,接收所述遥控终端发送的第一音频数据。
  7. 一种语音遥控系统,其中,所述语音遥控系统包括遥控终端、受控装置及云服务器;
    所述遥控终端,用于基于预设条件,获取用户语音遥控指令,还用于对所述语音遥控指令进行模数转换处理,得到第一音频数据,并发送所述第一音频数据至所述受控装置;
    所述受控装置,用于在接收到所述遥控终端发送的所述第一音频数据后,根据预设规则对所述第一音频数据进行处理,得到第二音频数据,并发送所述第二音频数据至所述云服务器;
    所述云服务器,用于在接收到所述第二音频数据后,根据预设的识别规则,识别所述第二音频数据并生成控制命令文本,发送所述控制命令文本至所述受控装置;
    所述受控装置,还用于接收所述云服务器下发的所述控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令。
  8. 如权利要求7所述的语音遥控系统,其中,所述遥控终端包括:
    接收单元,用于接收用户输入的开始录制指令或停止录制指令,并将接收到的所述开始录制指令或所述停止录制指令发送至:
    录音单元,用于在接收到所述开始录制指令后,检测用户语音遥控指令并对检测到的语音遥控指令进行录制,所述录音单元还用于在接收到所述停止录制指令后,停止所述录制动作,并保存录制的用户语音遥控指令;
    处理单元,用于对所述语音遥控指令进行模数转换处理,得到第一音频数据;
    发送单元,用于发送所述第一音频数据至所述受控装置。
  9. 一种受控装置,其中,所述受控装置包括:
    接收模块,用于接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
    处理模块,用于根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
    上传模块,用于发送所述第二音频数据至云服务器;
    执行模块,用于接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
  10. 如权利要求9所述的受控装置,其中,所述装置还包括:
    创建模块,用于创建Socket连接并向云服务器发送连接请求;
    连接模块,用于接收所述云服务器针对所述连接请求的响应,与所述云服务器建立Socket连接。
  11. 如权利要求10所述的受控装置,其中,所述处理模块包括:
    获取单元,用于获取预设的音频优化标准;
    优化单元,用于基于获取到的音频优化标准对所述第一音频数据进行优化,将优化后的第一音频数据作为第二音频数据。
  12. 如权利要求9所述的受控装置,其中,所述装置还包括:
    第一检测模块,用于检测是否接收到预设的写入指令;
    所述接收模块,还用于在所述第一检测模块的检测结果为“是”时,接收遥控终端发送的第一音频数据。
  13. 如权利要求12所述的受控装置,其中,所述装置还包括:
    第二检测模块,用于检测是否接收到预设的读取指令;
    所述上传模块,还用于在所述第二检测模块的检测结果为“是”时,发送所述第二音频数据至云服务器。
  14. 如权利要求9所述的受控装置,其中,所述接收模块包括:
    配对单元,用于响应于遥控终端发送的蓝牙配对请求,与所述遥控终端建立蓝牙连接;
    音频获取单元,用于基于所述蓝牙连接,接收所述遥控终端发送的第一音频数据。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有语音遥控计算机可读指令,所述语音遥控计算机可读指令被处理器执行时实现如下步骤:
    接收遥控终端发送的第一音频数据,所述第一音频数据由所述遥控终端根据获取到的用户语音遥控指令处理得到;
    根据预设规则对所述第一音频数据进行处理,得到第二音频数据;
    发送所述第二音频数据至云服务器;
    接收所述云服务器下发的控制命令文本,解析所述控制命令文本得到控制命令,执行所述控制命令;所述控制命令文本由所述云服务器根据所述第二音频数据处理得到。
PCT/CN2019/079991 2018-12-25 2019-03-28 语音遥控方法、系统、受控装置及计算机可读存储介质 WO2020133764A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811599357.0 2018-12-25
CN201811599357.0A CN109637534A (zh) 2018-12-25 2018-12-25 语音遥控方法、系统、受控装置及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2020133764A1 true WO2020133764A1 (zh) 2020-07-02

Family

ID=66077687

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/079991 WO2020133764A1 (zh) 2018-12-25 2019-03-28 语音遥控方法、系统、受控装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN109637534A (zh)
WO (1) WO2020133764A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110366018A (zh) * 2019-07-12 2019-10-22 杭州任你说智能科技有限公司 一种用于电视的双向交互遥控系统及操作方法
CN112802464A (zh) * 2019-11-14 2021-05-14 阿里巴巴集团控股有限公司 语音遥控方法、遥控终端以及服务器
CN111263100A (zh) * 2020-01-19 2020-06-09 中移(杭州)信息技术有限公司 视频通话方法、装置、设备及存储介质
CN111863041B (zh) * 2020-07-17 2021-08-31 东软集团股份有限公司 一种声音信号处理方法、装置及设备
CN112792439A (zh) * 2020-12-30 2021-05-14 唐山松下产业机器有限公司 语音识别焊接系统
CN113205810A (zh) * 2021-05-06 2021-08-03 北京汇钧科技有限公司 语音信号处理方法、装置、介质、遥控器及服务器

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120030712A1 (en) * 2010-08-02 2012-02-02 At&T Intellectual Property I, L.P. Network-integrated remote control with voice activation
CN106911949A (zh) * 2015-12-23 2017-06-30 北京奇虎科技有限公司 一种基于移动终端控制电器设备的方法和移动终端
CN107566226A (zh) * 2017-07-31 2018-01-09 深圳真时科技有限公司 一种控制智能家居的方法、装置和系统
CN107948695A (zh) * 2017-11-17 2018-04-20 浙江大学 语音智能遥控器及电视选台方法
CN108198549A (zh) * 2017-11-22 2018-06-22 珠海格力电器股份有限公司 一种设备控制方法、装置、存储介质、服务器及用户终端

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101938391A (zh) * 2010-08-31 2011-01-05 中山大学 一种处理语音的方法、系统、遥控器、机顶盒、云服务器
CN107635214B (zh) * 2017-08-21 2019-10-01 深圳创维-Rgb电子有限公司 基于蓝牙遥控器的响应方法、装置、系统及可读存储介质
CN108121528A (zh) * 2017-12-06 2018-06-05 深圳市欧瑞博科技有限公司 语音控制方法、装置、服务器和计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120030712A1 (en) * 2010-08-02 2012-02-02 At&T Intellectual Property I, L.P. Network-integrated remote control with voice activation
CN106911949A (zh) * 2015-12-23 2017-06-30 北京奇虎科技有限公司 一种基于移动终端控制电器设备的方法和移动终端
CN107566226A (zh) * 2017-07-31 2018-01-09 深圳真时科技有限公司 一种控制智能家居的方法、装置和系统
CN107948695A (zh) * 2017-11-17 2018-04-20 浙江大学 语音智能遥控器及电视选台方法
CN108198549A (zh) * 2017-11-22 2018-06-22 珠海格力电器股份有限公司 一种设备控制方法、装置、存储介质、服务器及用户终端

Also Published As

Publication number Publication date
CN109637534A (zh) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2020133764A1 (zh) 语音遥控方法、系统、受控装置及计算机可读存储介质
WO2013105782A1 (en) Image display apparatus and method of controlling the same
WO2019088802A1 (ko) 전자 장치 및 전자 장치 간 음성으로 기능을 실행하는 방법
WO2019041856A1 (zh) 家电控制方法、系统、控制终端、及存储介质
WO2014051367A1 (en) User terminal apparatus, electronic device, and method for controlling the same
WO2020133741A1 (zh) 控制外设的方法、电视及可读存储介质
WO2020155090A1 (zh) 音频的蓝牙传输方法、蓝牙收发器及计算机可读存储介质
WO2019223600A1 (zh) 蓝牙音频传输方法、装置及计算机可读存储介质
WO2019190073A1 (en) Electronic device and control method thereof
WO2014069820A1 (en) Broadcast receiving apparatus, server and control methods thereof
WO2015194693A1 (ko) 영상 표시 기기 및 그의 동작 방법
WO2020145631A1 (en) Content reproducing apparatus and content reproducing method
WO2015002384A1 (en) Server, control method thereof, image processing apparatus, and control method thereof
WO2021002611A1 (en) Electronic apparatus and control method thereof
WO2016119414A1 (zh) 一种基于二维码的文件共享方法、系统及移动终端
WO2020091183A1 (ko) 사용자 특화 음성 명령어를 공유하기 위한 전자 장치 및 그 제어 방법
WO2016111502A1 (en) System and method for transmitting information about task to external device
WO2018024012A1 (zh) 遥控器及投影方法
WO2019233190A1 (zh) 基于显示终端的文本转语音方法、显示终端及存储介质
WO2018076873A1 (zh) 数据分享方法、装置、介质、电子设备及系统
WO2021060575A1 (ko) 인공 지능 서버 및 그의 동작 방법
WO2015186857A1 (ko) 영상 표시 기기 및 그의 동작 방법
WO2017096955A1 (zh) 来电提醒设置方法、装置及相关设备
WO2019112332A1 (en) Electronic apparatus and control method thereof
WO2017171304A1 (ko) Ptt 통신용 핸즈프리, 이를 이용한 ptt 통신 시스템 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19902734

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19902734

Country of ref document: EP

Kind code of ref document: A1