WO2022268136A1 - Terminal device and server for voice control - Google Patents

Terminal device and server for voice control Download PDF

Info

Publication number
WO2022268136A1
WO2022268136A1 PCT/CN2022/100547 CN2022100547W WO2022268136A1 WO 2022268136 A1 WO2022268136 A1 WO 2022268136A1 CN 2022100547 W CN2022100547 W CN 2022100547W WO 2022268136 A1 WO2022268136 A1 WO 2022268136A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal device
voice
server
command
terminal
Prior art date
Application number
PCT/CN2022/100547
Other languages
French (fr)
Chinese (zh)
Inventor
王冰
李含珍
张路伟
Original Assignee
海信视像科技股份有限公司
聚好看科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110688867.0A external-priority patent/CN113450792A/en
Priority claimed from CN202110917713.4A external-priority patent/CN115910050A/en
Priority claimed from CN202111521226.2A external-priority patent/CN114172757A/en
Priority claimed from CN202210151526.4A external-priority patent/CN114609920A/en
Application filed by 海信视像科技股份有限公司, 聚好看科技股份有限公司 filed Critical 海信视像科技股份有限公司
Priority to CN202280038248.XA priority Critical patent/CN117882130A/en
Publication of WO2022268136A1 publication Critical patent/WO2022268136A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application relates to the technical field of voice interaction, in particular to a terminal device and a server for voice control.
  • voice interaction function With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. Using the voice interaction function, the user can voice control these terminal devices to perform corresponding operations, such as starting and stopping.
  • the process of the user's voice control of the terminal device is that the user inputs a voice signal, and after the terminal device collects the voice signal, it converts the voice signal into a corresponding instruction, so that the terminal performs corresponding operations according to the instruction.
  • the voice interaction functions of most terminal devices are limited by distance. Users cannot control the devices they want to control anywhere in the room. For example, the smart TV in the bedroom cannot be turned off or turned on by voice in the kitchen, and the temperature of the air conditioner in the bedroom cannot be adjusted by voice control in the living room. If the user wants to control the terminal device, he needs to move to an effective distance or increase the volume, resulting in poor user experience.
  • This embodiment provides a terminal device and a server for voice control, including: a voice collector configured to collect a voice signal input by a user; a controller configured to: receive a voice signal input by a user from the voice collector , sending the voice signal to a server, and receiving a voice command from the server, wherein the voice command is generated according to the voice signal; when the terminal device can perform an operation corresponding to the voice command, Responding to the voice instruction, performing an operation corresponding to the voice instruction; when the terminal device cannot perform the operation corresponding to the voice instruction, generating an instruction distribution request, and sending the instruction distribution request to the server,
  • the server is configured to search for other terminal devices capable of performing operations corresponding to the voice command according to the command distribution request, and send the voice command to other terminal devices.
  • FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application.
  • FIG. 2 is a schematic framework diagram of a voice control system of a terminal device provided in an embodiment of the present application
  • FIG. 3 is a schematic diagram of a scenario of a voice control system of a terminal device provided in an embodiment of the present application
  • FIG. 4 is a schematic diagram of a scene of another voice control system of a terminal device provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a scene of another voice control system of a terminal device provided by an embodiment of the present application.
  • FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application.
  • FIG. 7 is a usage scenario diagram of a voice control system provided by an embodiment of the present application.
  • FIG. 8 is a hardware configuration diagram of a terminal device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a voice interaction process provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of multiple terminal devices responding to voice interaction effects provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a multi-device voice wake-up method provided by an embodiment of the present application.
  • FIG. 12 is a schematic flow diagram of a screening process for a second terminal device provided in an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of determining a second terminal device according to the number of devices provided by an embodiment of the present application.
  • Fig. 14 is a schematic flow diagram of a marking master device provided by the embodiment of the present application.
  • FIG. 15 is a schematic diagram of a process flow for updating device status provided by an embodiment of the present application.
  • FIG. 16 is a server-side sequence flow chart of a multi-device voice wake-up method provided by an embodiment of the present application.
  • FIG. 17 is a sequence flow chart on the terminal device side of a multi-device voice wake-up method provided by an embodiment of the present application.
  • FIG. 18 is an application scenario diagram of a terminal device provided in an embodiment of the present application.
  • FIG. 19 is another application scenario diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 20 is a schematic flowchart of a voice control method provided in an embodiment of the present application.
  • FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application.
  • FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application.
  • FIG. 23A is a schematic flowchart of a voice control method provided by an embodiment of the present application.
  • FIG. 23B is a schematic diagram of the principle of a voice control method provided in the embodiment of the present application.
  • FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application.
  • FIG. 24A is a schematic flowchart of another terminal home control method provided by the embodiment of the present application.
  • FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application.
  • FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application.
  • FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application.
  • FIG. 27A is a schematic structural diagram of a local control device in an embodiment of the present application.
  • FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application.
  • the smart device is used to receive input information and output processing results of the information.
  • Speech recognition service equipment is electronic equipment deployed with voice recognition services
  • semantic service equipment is electronic equipment deployed with semantic services
  • business service equipment is electronic equipment deployed with business services.
  • the electronic device here may include a server, a computer, etc.
  • the speech recognition service, semantic service (also called a semantic engine) and business service here are web services that can be deployed on the electronic device, wherein the speech recognition service is used for audio Recognized as text, the semantic service is used for semantic analysis of the text, and the business service is used to provide specific services such as the weather query service of Moji Weather, the music query service of QQ Music, etc.
  • the speech recognition service is used for audio Recognized as text
  • the semantic service is used for semantic analysis of the text
  • the business service is used to provide specific services such as the weather query service of Moji Weather, the music query service of QQ Music, etc.
  • there may be multiple entity service devices deployed with different business services in the architecture shown in FIG. 1 or one or more functional services may be integrated in one or more entity service devices.
  • the following is an example description of the process of processing the information input to the smart device based on the architecture shown in Figure 1. Taking the information input to the smart device as a query sentence input by voice as an example, the above process may include the following three processes :
  • the smart device After receiving the query sentence input by voice, the smart device can upload the audio of the query sentence to the voice recognition service device, so that the voice recognition service device can recognize the audio as text through the voice recognition service and return it to the smart device.
  • the smart device before uploading the audio of the query sentence to the speech recognition service device, the smart device may perform denoising processing on the audio of the query sentence, where the denoising processing may include steps such as removing echo and environmental noise.
  • the smart device uploads the text of the query sentence recognized by the speech recognition service to the semantic service device, so that the semantic service device can perform semantic analysis on the text through the semantic service to obtain the business field and intention of the text.
  • the semantic service device sends a query instruction to the corresponding business service device to obtain the query result given by the business service.
  • the smart device can obtain and output the query result from the semantic service device.
  • the semantic service device can also send the semantic analysis result of the query sentence to the smart device, so that the smart device can output the feedback sentence in the semantic analysis result.
  • FIG. 1 is only an example, and does not limit the protection scope of the present application. In the embodiment of the present application, other architectures may also be used to implement similar functions. For example, all or part of the three processes may be completed by a smart terminal, which will not be described in detail here.
  • the smart device shown in Figure 1 can be a display device, such as a smart TV, and the function of the voice recognition service device can be realized by the cooperation of the sound collector and the controller set on the display device, and the semantic service device and business service device The functions of can be realized by the controller of the display device, or by the server of the display device.
  • voice interaction function With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. Using the voice interaction function, the user can voice control these terminal devices to perform corresponding operations, such as starting and stopping.
  • the process of the user's voice control of the terminal device is that the user inputs a voice signal, and after the terminal device collects the voice signal, it converts the voice signal into a corresponding instruction, so that the terminal performs corresponding operations according to the instruction.
  • the voice interaction functions of most terminal devices are limited by distance. Users cannot control the devices they want to control anywhere in the room. For example, the smart TV in the bedroom cannot be turned off or turned on by voice in the kitchen, and the temperature of the air conditioner in the bedroom cannot be adjusted by voice control in the living room. If the user wants to control the terminal device, he needs to move to an effective distance or increase the volume, resulting in poor user experience.
  • FIG. 2 is a schematic framework diagram of a voice control system for a terminal device provided in an embodiment of the present application.
  • the system includes at least two terminal devices 200 and a server 400 .
  • the terminal device 200 is configured to collect voice signals input by the user.
  • the terminal device 200 communicates with the server 400.
  • the server 400 is configured to receive a signal or a request sent by the terminal device 200 , and feed back corresponding instructions to the terminal device 200 .
  • the sound collector of the terminal device 200-1 collects the voice signal input by the user. Then the terminal device 200 - 1 sends the collected voice signal to the server 400 .
  • the server 400 generates voice instructions according to the voice signal. It should be noted that the server 400 uses the semantic system to convert the voice signal into a voice instruction, and the specific conversion process here is not limited in this application.
  • the server 400 feeds back the converted voice instruction to the terminal device 200-1.
  • the local execution capability module of the terminal device 200-1 judges whether the terminal device has the capability to execute the operation corresponding to the voice command. If the judging result is that the operation corresponding to the voice command is capable of being performed, the voice command is sent to the controller. In response to the voice command, the controller controls the terminal device 200-1 to perform an operation corresponding to the voice command.
  • an instruction distribution request is generated according to the voice instruction, and the instruction distribution request carries the voice instruction. Then send the instruction distribution request to the server 400 .
  • the server 400 After receiving the command distribution request, the server 400 searches for a terminal device 200 that can execute the voice command. For example, if it is found that the terminal device 200-2 can perform the operation corresponding to the voice command, the voice command is sent to the terminal device 200-2, so that the controller of the terminal device 200-2 controls the terminal device in response to the voice command 200-2 Execute the operation corresponding to the voice instruction.
  • the user inputs a voice signal "turn on the TV” near the smart speaker, the TV is in the bedroom, but the smart speaker is in the kitchen.
  • the smart speaker After receiving the voice signal “turn on the TV”, the smart speaker sends the voice signal to the server 400 .
  • the server converts the voice signal into a voice command, and feeds the voice command back to the smart speaker.
  • the smart speaker Since the smart speaker cannot perform the operation of “turning on the TV”, the smart speaker sends an instruction distribution request carrying the voice command “turn on the TV” to the server 400 .
  • the server 400 After receiving the command distribution request, the server 400 searches for a terminal device capable of executing the voice command "turn on the TV". If it is found that the terminal device capable of executing the voice command "turn on the TV” is a TV, then the voice command "turn on the TV” is sent to the TV in the bedroom. After the TV in the bedroom receives the voice command "Turn on the TV", it responds to the voice command and performs a power-on operation. In this way, the purpose of turning on the TV in the bedroom can be controlled by voice without being in the bedroom.
  • each terminal device is configured with a local execution capability filtering module, and the local execution capability filtering module is configured with local capability attribute parameters.
  • the specific steps for the terminal device to determine whether the machine is capable of performing the corresponding operation of the voice command are as follows:
  • the native capability attribute parameter is matched with the pending capability attribute parameter. If the native capability attribute parameter matches the pending capability attribute parameter, it means that the terminal device can execute the operation corresponding to the voice command. If the local capability attribute parameter does not match the pending capability attribute parameter, it means that the terminal device cannot perform the operation corresponding to the voice command.
  • the terminal device 200-1 is a display device
  • the terminal device 200-2 is an air conditioner device
  • the terminal 200-3 is a washing machine device
  • the terminal 200-4 is a refrigerator device. Then the local capability attribute parameter of the terminal device 200-1 is playing audio and video, the local capability attribute parameter of the terminal 200-2 is cooling and heating, the local capability attribute parameter of the terminal 200-3 is laundry, and the terminal 200-4
  • the Native Capability property parameter of is Cooling.
  • the terminal device 200-2 collects the voice signal and receives the voice command "heating" sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capacity to be processed is "heating”. Then, through the local capability filtering module of the terminal device 200-2, the local capability attribute parameter of the terminal device 200-2 is "heating". The native capability attribute parameter of the terminal device 200-2 can match the capability attribute parameter to be processed. It means that the terminal device 200-2 can perform the operation corresponding to the voice command "heating".
  • the native capability filtering module of the present application can also perform corresponding conversion according to the parsed capability attribute parameters to be processed. For example, if the user inputs the voice signal "heating" within the signal receiving range of the terminal device 200-2, the text "heating" will be obtained after being parsed by the local capability attribute module. At this time, the capability attribute parameter to be processed cannot completely match the local capability attribute parameter “heating” of the terminal device 200-2.
  • the local capacity filtering module can analyze the attribute parameters of the capacity to be processed, and obtain that "heating" and "heating” have the same meaning. Therefore, the pending capability attribute parameter is considered to match the native capability attribute parameter. That is, it is obtained that the terminal device 200-2 can realize the operation corresponding to the voice signal "heating”.
  • the terminal device 200-2 collects the voice signal and receives the voice command "play music” sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capability to be processed is "play music”. Then, through the local capability filtering module of the terminal 200-2, the local capability attribute parameters of the terminal device 200-2 are "cooling” and “cooling”. Then the local capability attribute parameter of the terminal device 200-2 does not match the capability attribute parameter to be processed. It means that the terminal device 200-2 cannot perform the operation corresponding to the voice instruction "play music”.
  • the server searches for the second terminal device corresponding to the device name according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.
  • FIG. 3 is a schematic diagram of a voice control system for a terminal device provided in the embodiment of the present application.
  • Device 3 and other voice commands. These voice commands all only include the device name "device 3". If the device name of device 1 does not match the device name included in the voice command, device 1 cannot perform the operation corresponding to the voice command.
  • the server searches for a terminal device with a matching name based on the device name. Finally, it is found that the device name of device 3 matches the device name included in the voice command, and the server sends the voice command to device 3 . After the device 3 receives the voice command "turn on the device 3", it responds to the voice command and executes the starting operation. Alternatively, after the device 3 receives the voice command "shut down the device 3", it responds to the voice command and performs a shutdown operation.
  • the second terminal device can also use the local capability filtering module to reconfirm whether the machine can perform the operation corresponding to the voice command. If it is confirmed again that the machine can perform the operation corresponding to the voice command, the corresponding operation is performed in response to the voice command. If it is reconfirmed that the machine cannot perform the operation corresponding to the voice command, the second terminal device can feed back an error signal to the server, so that the server can search for a terminal device that can perform the corresponding operation of the voice command again.
  • the server searches for a second terminal device having the device capability according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.
  • Fig. 4 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application.
  • the user inputs "reduce the temperature” or “adjust the temperature to 20 degrees” within the acceptable range of the speaker device signal , "increase temperature", “increase wind speed” and other voice commands.
  • These voice commands only include device capabilities. If the local capability attribute parameter of the speaker device does not conform to the device capability parameter included in the above voice command, the audio device cannot perform the operation corresponding to the above voice command.
  • the speaker device sends an instruction distribution request to the server, and the server searches for a terminal device that meets the device's capability parameters according to the instruction distribution request. Among the devices shown in FIG. 4 , only the local capability attribute parameter of the air conditioner conforms to the device capability parameter. Then the server sends the voice command to the air conditioner, and the air conditioner performs corresponding operations in response to the voice command after receiving the voice command.
  • the server will send the voice command to the multiple terminal devices that meet the conditions. Multiple terminal devices perform corresponding operations in response to the voice instruction.
  • the user inputs "lower temperature" within the acceptable range of the air conditioner signal, and the local capability filtering module of the air conditioner first judges that the local unit can perform the operation corresponding to the voice command according to the local capability attribute parameters. Further, the air conditioner also sends an instruction distribution request to the server. According to the device capability parameter carried in the specified distribution request, the server searches for terminal devices other than the air conditioner that also meet the device capability parameter. That is, find the terminal device that can perform the operation corresponding to the voice command "lower temperature”. Finally, if you find the refrigerator, you can also perform the corresponding operation of the voice command "lower temperature”. The server sends the voice command "lower temperature" to the refrigerator, so that the refrigerator performs corresponding operations in response to the voice command. Through this embodiment, the user can input a voice command once and control multiple terminal devices at the same time.
  • voice commands that include both device name and device capability parameters.
  • the voice command includes the device name "air conditioner” and the device capability parameter "reduce the temperature”. Then the air conditioner judges through the local capability filtering module that the local machine can perform the operation corresponding to the voice command, and at the same time, the device name of the air conditioner matches the device name carried in the voice command. Therefore, the air conditioner no longer sends an instruction distribution request to the server, and the server no longer searches for other terminal devices.
  • the server searches for a matching second terminal according to the custom rules.
  • the voice command has a corresponding relationship with the terminal device.
  • FIG. 5 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application.
  • Device 4 gives priority to playing audio novels, etc., that is, the instruction to play music corresponds to Device 2, the instruction to play movies and TV corresponds to Device 3, and the instruction to play audio novels corresponds to Device 4.
  • the local capability filtering module of device 1 first judges that the machine cannot perform the operation corresponding to the voice command, and then device 1 sends a command distribution command to the server.
  • the server finds that the terminal device corresponding to the voice command "play music" is device 2, and then sends the voice command to device 2.
  • the server finds that the terminal device corresponding to the voice command "play video" is device 3, and then sends the voice command to device 3.
  • the server includes a fusion capability rules database and an instruction distribution module.
  • the native capability attribute parameters of all devices are stored in the fusion capability rule database. Operators can update the device's local capability attribute parameters in the fusion capability rule database. For example, if a terminal device has been updated to have a new capability, it is necessary to increase the local capability attribute parameter of the device.
  • all devices are stored according to device name and device ID.
  • the instruction distribution module receives the instruction distribution request sent by the terminal device, and can parse the capability attribute parameter to be processed from the voice instruction carried in the instruction distribution request. Afterwards, the command distribution module searches the fusion capability rule database for the local capability attribute parameters that match the capability attribute parameters to be processed, so as to find the terminal device that can perform the corresponding operation of the voice command.
  • the native capability filtering module of the terminal device may also search for native capability attribute parameters from the fusion capability rule database.
  • the user when the user inputs vague voice commands, there may be multiple terminal devices that can perform operations corresponding to the voice commands.
  • the vague voice command may be a vague device control command, a vague media asset playback command, and the like.
  • the voice command can be directly sent to the air conditioner in the living room, so that the air conditioner in the living room can be activated.
  • both the air conditioner in the living room and the air conditioner in the bedroom can perform the corresponding operation of the voice command. Therefore other properties can be set to target more specific devices. For example, formulate time rules: turn on the air conditioner in the living room from 11:00 to 14:00, and turn on the air conditioner in the bedroom from 15:00 to 17:00.
  • the voice command is sent to the air conditioner in the living room according to the time rule, so that the air conditioner in the living room performs the start operation.
  • the voice command input by the user may include multiple matching items, for example, may include device name, device response time period, space where the device exists, device capability parameters, and the like. Different terminal devices may satisfy the matching items included in the voice command at the same time.
  • the voice command includes four matching items: the device name, the device response time period, the space where the device exists, and the device capability parameter.
  • Device 1 meets the matching item device name and time zone, and device 2 meets the matching item time range and device capability parameters.
  • the corresponding weight value can be set for each matching item.
  • the weight value of the device name is 10
  • the weight value of the time period is 5
  • the weight value of the space is 3
  • the weight value of the device capability parameter is 8. According to formula (1)
  • a i is the weight value of the matching items that each terminal device meets, and the final weight values of device 1 and device 2 are respectively obtained.
  • the total value of the weight attribute of device 1 is 15, and the total value of the weight attribute of device 2 is 11.
  • the server sends the voice command to the device 1, so that the device 1 performs corresponding operations in response to the voice command.
  • the servers in this application can be divided into semantic servers and instruction distribution servers.
  • the semantic server is used to recognize voice commands from voice signals input by users.
  • the instruction distribution server stores a fusion capability rule database, which is used to search for terminal devices that can perform operations corresponding to voice instructions according to the instruction distribution request.
  • the semantic server can be a web server, while the instruction distribution server is a local server. Since the local server has the advantage of fast response, the response speed of the entire voice control process can be improved.
  • FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application.
  • the signaling diagram is shown in FIG. 6 , so Said method comprises the following steps:
  • the microphone of the first terminal device receives a voice signal input by a user.
  • S102 The first terminal device sends the voice signal to the server.
  • S103 The server generates a voice command according to the voice signal, and feeds back the voice command to the first terminal device.
  • S104 The first terminal device judges whether the machine can perform the operation corresponding to the voice command.
  • the server After receiving the instruction distribution request, the server searches for a second terminal device that can perform the operation corresponding to the voice instruction according to the instruction distribution request, and sends the voice instruction to the second terminal device.
  • S108 The second terminal device performs a corresponding operation in response to the voice instruction.
  • the specific process for the first terminal device to determine whether the machine can perform the operation corresponding to the voice command is as follows:
  • the local capability filtering module of the first terminal device may acquire the native capability attribute parameters from the fusion capability rule database. Afterwards, the local capability attribute parameter and the pending capability attribute parameter are matched, and if they match, the first terminal device may execute the operation corresponding to the voice command. If they do not match, the first terminal device cannot perform the operation corresponding to the voice command.
  • the server searches for the second terminal device corresponding to the device name when searching for the second terminal device. For example, if the voice instruction is "turn on the speaker", the server searches for the speaker device according to the device name "speaker".
  • the server searches for the second terminal device with the device capability parameter when searching for the second terminal device. For example, if the voice command is "decrease temperature", it is recognized that the capability parameter of the device to be processed is "decrease temperature”.
  • the instruction distribution module of the server can search the fusion capability rule database for the native capability attribute parameter matching the capability parameter of the device to be processed. That is, the terminal device that can perform the operation corresponding to the voice command is found.
  • the server searches for the second terminal device corresponding to the custom rule.
  • the custom rules include: device 2 gives priority to playing music, device 3 gives priority to playing videos, and device 4 gives priority to playing audio novels, etc. If the voice command input by the user is "play music", then the device 2 corresponds to the custom rule, and the device 2 is determined as the second terminal device.
  • each matching item is set with a weight attribute value. If the server finds at least two terminal devices satisfying at least one matching item when searching according to matching items, then calculate the total weight attribute value of all matching items satisfied by these terminal devices, that is, the sum of weight values. The one with the largest weight attribute total value is determined to be the second terminal device.
  • the voice control system in the embodiment of the present application is a network system established based on a specific area network and based on a unified control service.
  • the voice control system may include a plurality of terminal devices 200 that establish communication connections with each other. Multiple terminal devices 200 can realize the communication connection relationship between the devices by accessing the same local area network. A plurality of terminal devices 200 can also directly form a point-to-point network through a unified communication protocol to realize a communication connection. For example, multiple terminal devices 200 may communicate with each other by connecting to the same wireless local area network. For another example, one terminal device 200 may also establish communication connections with other multiple terminal devices 200 through Bluetooth, infrared, cellular network, power carrier communication and other means.
  • the terminal device 200 refers to a device having a communication function, capable of receiving, sending, and executing control instructions and realizing specific functions.
  • the terminal device 200 includes, but is not limited to, a smart display device, a smart terminal, a smart home appliance, a smart gateway, a smart lighting device, a smart audio device, a game device, and the like.
  • the multiple terminal devices 200 constituting the voice control system may be of the same type or of different types. For example, as shown in FIG. 7 , in the same voice control system, smart TVs, smart speakers, smart refrigerators, multiple smart lamps, etc. may be included. These terminal devices 200 may be distributed in different locations, so as to meet usage requirements at corresponding locations.
  • the voice control system described in this application does not limit the scope of application of the solution to be protected in this application. That is, in practical applications, the server, terminal equipment and voice control method provided by this application are not limited to the application in the field of smart home, for other systems that support intelligent voice control, such as smart office systems, smart service systems, smart management The same applies to systems, industrial production systems, etc.
  • a terminal device 200 with a display function may include a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, and an audio output interface. 270. At least one of a memory, a power supply, and a user interface.
  • the controller 250 includes a CPU, a video processor, an audio processor, a graphics processor, a RAM, a ROM, and a first interface to an nth interface for input/output.
  • the display 260 includes a display screen component for presenting images, and a drive component for driving image display, for receiving image signals output from the controller, and displaying video content, image content, and menu manipulation interface. Components and user manipulation of the UI interface, etc.
  • the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.
  • the tuner-demodulator 210 receives broadcast TV signals through wired or wireless reception, and demodulates audio and video signals, such as EPG data signals, from multiple wireless or cable broadcast TV signals.
  • the external device interface 240 may include, but is not limited to, the following: high-definition multimedia interface (HDMI), analog or data high-definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • HDMI high-definition multimedia interface
  • component analog or data high-definition component input interface
  • CVBS composite video input interface
  • USB input interface USB
  • Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • the controller 250 controls the work of the smart device and responds to user operations through various software control programs stored in the memory.
  • the controller 250 controls overall operations of the terminal device 200 . For example, in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.
  • the user can input user commands through a graphical user interface (GUI) displayed on the display 260, and the user input interface receives user input commands through the graphical user interface (GUI).
  • GUI graphical user interface
  • the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.
  • the terminal device 200 also performs data communication with the server 400 .
  • the terminal device 200 may be allowed to communicate via a local area network (LAN), a wireless local area network (WLAN), and other networks.
  • the server 400 may provide various contents and interactions to the terminal device 200 .
  • the server 400 may be one cluster, or multiple clusters, and may include one or more types of server groups.
  • the terminal device 200-1 may have a built-in voice control system to support the user's intelligent voice control.
  • the intelligent voice control refers to an interactive process in which the user operates the terminal device 200-1 by inputting voice and audio data.
  • the terminal device 200-1 may include an audio input device and an audio output device.
  • the audio input device is used to collect voice and audio data input by the user, and may be a built-in or external microphone device of the terminal device 200-1.
  • the audio output device is used to emit sound to play the voice response. For example, as shown in FIG. 9, when the user inputs a wake-up word such as "Hi! Little ⁇ " through the audio input device, the terminal device 200-1 can play a voice response of "I'm here" through the audio output device to guide the user to complete the follow-up. Voice input.
  • the built-in intelligent voice system of the terminal device 200 also supports a one-language direct mode, that is, supports a "one-shot” mode.
  • a one-language direct mode that is, supports a "one-shot” mode.
  • the user can directly realize the control function through a small number of voice input. For example, in the traditional mode, if the user wants to control the terminal device 200 to play movie resources, he needs to input the voice "Hi, X" first, and then input "I want to watch a movie” after the terminal device 200 feedbacks "I'm here", then The terminal device 200 feeds back "the following movies have been found for you”.
  • the “one-shot” mode the user can directly input "Hi! X, I want to watch a movie", and the terminal device 200 will directly feed back "find the following movies for you” after receiving the voice command, reducing the number of voice interactions , improve voice interaction efficiency.
  • the user can control the linkage of multiple devices through intelligent voice.
  • the user can input a voice command "turn on the bedroom light” through the smart speaker, and the smart speaker can respond to the voice command to generate a control command for turning on the light, and then send the control command to the voice control system named "bedroom". lamps to control the turning on of the bedroom lights.
  • the smart speaker also responds to the user's voice input, that is, it plays feedback voice content such as "the bedroom light has been turned on for you".
  • the control command can be directly transmitted to the controlled device through the terminal device 200-1 that receives the user's voice and audio data, or can be transmitted to a specific intermediate device such as a router through the terminal device 200-1.
  • the relay device and then passed to the controlled device by the relay device.
  • the control instruction may also be transmitted to the controlled device through the server 400 .
  • the smart terminal 300 can first send the control command to the server 400, and the server 400 then transmits the control command to the terminal device 200, for control.
  • the server 400 can issue control instructions and related data to any terminal device 200 independently.
  • the user can control the display device to request online playback of media assets through interactive operations, and the server 400 can feed back media asset data to the display device according to the playback request.
  • the server 400 can send control instructions and related data to the voice control system in a unified manner.
  • the smart speaker can send the control command input by the user to the server 400, and the server 400 sends feedback data to the voice control system, so that the voice control system sends an opening command to the bedroom lamp.
  • the control response is fed back to the smart speaker.
  • Some terminal devices 200 in the voice control system can have a built-in complete voice control system.
  • This type of terminal device 200 can be used as the main control device, which can receive, process and respond independently, and can send voice and audio corresponding control to other terminal devices 200.
  • a complete voice control system may be built in terminal devices 200 such as a display device, a smart speaker, and a smart refrigerator, so as to receive voice and audio input by a user.
  • Part of the terminal devices 200 in the voice control system may not have a complete intelligent voice system built in, and only serve as controlled devices to receive control instructions sent by the master control device.
  • smart devices such as lamps and small household appliances can receive control instructions from the display device as the main control device to start, stop or change operating parameters.
  • the same voice control system may include multiple terminal devices supporting the voice control system.
  • a smart TV, a smart speaker, and a smart refrigerator are set in the same room, and these terminal devices 200 all have built-in complete voice control systems, which can respond to voice commands input by users.
  • the ways of actually responding to voice commands and the types of supported voice commands are different. For example, as shown in Figure 10, for the voice command "I want to watch a movie" input by the user, the smart TV can respond by displaying a list of movies and feedback the voice content of "the following movies have been found for you". Smart speakers and smart refrigerators cannot respond, so they will feedback the voice content of "I can't understand what you are saying".
  • the current voice control system includes multiple terminal devices 200 capable of supporting voice control, for the same voice command, multiple terminal devices 200 may wake up at the same time or by mistake, resulting in scene confusion , seriously affecting the user experience.
  • the user can define a response device through the application program in the smart terminal 300 according to usage habits, and freely switch between different wake-up strategies. For example, the user can manually set the smart speaker as the main response device, then the voice command input by the user can be responded by the smart speaker, and control commands are sent to other terminal devices 200 through the smart speaker, so as to realize the control of the terminal devices in the entire voice control system. Intelligent voice control.
  • the method of controlling the wake-up policy in a user-defined manner requires the user to perform multiple manual switching operations, which is not intelligent enough.
  • the current execution process of multi-device wake-up is to determine which terminal device is currently woken up by communicating with each other between the devices to be woken up. There are great risks in this execution method.
  • the wake-up process requires information interaction between each terminal device 200, it cannot be guaranteed that all terminal devices 200 will be completed within the specified time. The interaction of information between the terminal equipment 200 causes abnormal responses.
  • the wake-up delays of different types of terminal devices 200 are different, that is, the time from wake-up to response is different, so it cannot be guaranteed that different types of terminal devices 200 can be in the device information interaction time period at the same time when making wake-up decisions, and partially wake up
  • the time-extended terminal device 200 may not have received the wake-up word during device information interaction, thus missing the time for device information interaction, causing the terminal device 200 to be unable to respond to the voice, and the problem of abnormal voice control occurs.
  • the voice control system includes a server 400 and multiple terminal devices 200 .
  • the server 400 should at least include a storage module 410 , a communication module 420 and a control module 430 .
  • the storage module 410 is configured to store the device status reported by the terminal device 200 .
  • the communication module 420 is configured to establish a communication connection with a plurality of terminal devices 200 to obtain device statuses reported by the terminal devices 200 and to issue control instructions and related data to the plurality of terminal devices 200 .
  • the control module 430 is configured to execute the program steps on the side of the server 400 in the voice control method, so as to issue a response command or a silence command to different terminal devices 200 .
  • the terminal device 200 in the voice control system should at least include an audio input device, an audio output device, a communicator 220 and a controller 250 .
  • the audio input device is configured to detect voice audio data input by the user.
  • the audio output device is configured to play the spoken response.
  • the communicator 220 is configured to establish a communication connection with the server 400 , so as to report the status of the device to the server 400 and receive a response instruction or a silent instruction issued by the server 400 .
  • the controller 250 is configured as a program step executed on the terminal device 200 side in the voice control method, so as to complete the response of the intelligent voice control process.
  • the voice control method includes the following contents:
  • the terminal device 200 acquires voice and audio data input by the user.
  • the user can perform voice input in real time, and the built-in audio input device of the terminal device 200 can convert the voice signal input by the user into an electrical signal, and undergo a series of noise reduction, amplification, encoding, conversion, etc.
  • the signal processing method obtains speech and audio data.
  • the user can input voice and audio data in various ways. That is, in some embodiments, the user can input voice and audio data through the built-in audio input device of the terminal device 200 . For example, the user can input the voice "Hi! Xiao X, I want to watch a movie" through the built-in microphone device on the terminal device 200, then the microphone can convert the voice signal into an electrical signal, and transmit it to the controller 250 for subsequent processing .
  • the user may also include a specific wake-up word in the input voice command.
  • the wake-up word is a piece of speech containing specific content, such as "Hi! Xiao ⁇ ", “Xiao ⁇ xiao ⁇ ”, "Hey! ⁇ ” and so on.
  • the terminal device 200 can judge whether the voice input by the user contains a wake-up word. , and then perform subsequent processing to alleviate false triggering of the intelligent voice control process.
  • the terminal device 200 closer to the user will first detect to the user's voice audio data.
  • the terminal device 200 that responds to the voice is uncertain, that is, the terminal device 200 that responds to the voice may be a device that is closer to the user, or may be a device that is farther away from the user. remote device. For example, when a user enters the voice of "Hi! X, I want to watch a movie" in the bedroom, the smart speaker in the bedroom will first detect the voice and audio data, but the smart speaker does not have a video playback function, while the smart speaker in the living room Smart TVs have video playback capabilities.
  • the terminal device 200 will generate a voice instruction according to the voice and audio data.
  • the voice command is a control command, which has a specific command format, including control action functions, control object codes, and the like.
  • the terminal device 200 can first convert the voice and audio data into text through the voice processing module in the intelligent voice system, that is, convert the waveform data in the voice and audio data into text data.
  • the terminal device 200 can use a word segmentation tool to convert unstructured text data into structured text data. That is, the terminal device 200 can remove meaningless text content such as modal particles and auxiliary words in the text data by means of thesaurus matching, retain keywords in the text data, and separate multiple keywords according to word meanings to obtain structured text.
  • a word segmentation tool to convert unstructured text data into structured text data. That is, the terminal device 200 can remove meaningless text content such as modal particles and auxiliary words in the text data by means of thesaurus matching, retain keywords in the text data, and separate multiple keywords according to word meanings to obtain structured text.
  • the terminal device 200 may also input the structured text into the word processing model.
  • the word processing model is an artificial intelligence model based on machine learning. After the text data is input, the word processing model can calculate and determine the classification probability that the text information belongs to a specific semantic meaning. Therefore, by using various standard control instructions as classification labels, the word processing model can output the classification probability of text data for each standard control instruction, where the standard control instruction with the highest classification probability is the control instruction corresponding to the voice and audio data .
  • the word processing model can be obtained by repeatedly training the initial model by using sample data and set input and output rules.
  • the sample data is text information with labels.
  • the sample data can be used as the input and the classification probability can be used as the output to perform calculations on the sample data. And compare the output result with the label in the sample data to obtain the training error, and then backpropagate the training error, that is, adjust the model parameters according to the training error, so that after repeated input of a large number of sample data, an accurate output can be obtained Word processing model for recognition results.
  • the terminal device 200 can convert the voice and audio data input by the user into voice instructions.
  • the controlled device or the server 400 can directly process the voice command after receiving the voice command, such as executing control actions according to the voice command and extracting service requirement information from the voice command.
  • the terminal device 200 can directly send voice and audio data as a voice command, that is, for a terminal device 200 with low data processing capability or without a built-in complete voice control system, the terminal device 200 can directly send the audio data to The data is forwarded, and the server 400 or other terminal devices 200 perform language processing, so as to alleviate the computing load of the current terminal device 200 .
  • the terminal device 200 may send the voice command to the server 400 to trigger the server 400 to perform control on the wake-up process of multiple terminal devices 200 .
  • the voice control system may include multiple terminal devices 200 with built-in voice control systems, when the user inputs voice, the multiple terminal devices 200 in the voice control system can all detect voice and audio data.
  • the server 400 may suspend the voice command generation process and the voice command reporting process in other terminal devices 200 after receiving a voice command.
  • the server 400 can send a control command for suspending command generation and command transmission to the smart speakers and smart refrigerators in the voice control system where the smart TV is located, then after receiving the control command After that, both the smart speaker and the smart refrigerator stop generating and sending voice commands. Since the terminal device 200 with higher data processing capability can usually complete the voice and audio data calculation in a shorter time, it can complete the generation of the voice instruction before other devices. Therefore, after receiving the voice command sent first, the server 400 stops the voice command generation and reporting process of other terminal devices 200, which can also shorten the voice command generation time and improve the voice response speed.
  • the server 400 may analyze the service requirement information in the voice command. For different voice commands input by the user, the control content contained therein is also different, so they have different service requirements. For example, when the user inputs the voice "Hi! X, I want to listen to music", after processing by the terminal device 200, a voice command is generated, and the voice command includes the service requirement of "play music” (music_play). When the user enters the voice "Hi! X, turn on the bedroom light”, a voice command containing the business requirement of "turn on the lamp” (light_power on) is generated.
  • the server 400 can directly extract the service requirement information from the summary of the voice instruction. And when the voice command is the voice and audio data uploaded by the terminal device 200, the server 400 can also identify and process the voice and audio data uploaded by the terminal device 200, that is, the processing method performed by the terminal device 200 on the voice and audio data in the above-mentioned embodiment is the same In other words, the server 400 can also recognize the voice and audio data through built-in speech-to-text tools, text structured processing tools, and word processing models, so as to identify business requirement information therefrom.
  • a business requirement recognition model can be set, or the output classification of the above-mentioned word processing model can be set as a business requirement, so as to calculate the user voice and audio data through the model. Classification probability for each business requirement.
  • the voice content input by the user may contain multiple user intentions, multiple service requirements may also be parsed from the corresponding voice instructions. For example, if the user inputs the voice "Hi! X, turn on the light in the living room and play a movie", the two business requirements of "turn on the light” and “play a movie” can be parsed out from the voice command.
  • the voice control system can also realize richer voice interaction functions by presetting a richer instruction set, and then according to the set instruction set, the business requirements contained in it can be determined correspondingly. For example, if the user inputs the voice "Hi!
  • the voice control system can determine the control content of the voice control system according to the instruction set of "theater mode", including playing a movie and turning off the lights at the same time, so as to imitate the atmosphere of a movie theater . Therefore, the server 400 can parse out the two service requirements of "turn off the lamp” and "play a movie” from the voice command.
  • Different service requirements correspond to different control operations performed by the terminal device 200, and correspond to different device states for the terminal device 200 that needs to respond to the voice command.
  • a lamp For example, for a lamp, it can only support on/off, brightness adjustment and other controls when it is in the standby state; when the user turns off the power supply of the lamp through the wall switch and makes it offline, it cannot support on/off, brightness adjustment, etc. control.
  • the terminal device 200 can report the device status to the server 400 through a predetermined information reporting strategy.
  • the terminal device 200 may report the current device status to the server 400 every specific time according to the data update frequency, and the server 400 may update the stored device status according to the reported status of the terminal device 200 .
  • the server 400 may send a heartbeat command to the terminal device 200, and the terminal device 200 may feed back the current device status to the server 400 after receiving the heartbeat command, so that the server 400 may update the stored device status. And when the server 400 sends a heartbeat command to the terminal device 200 within a preset period, and the terminal device 200 does not feed back a heartbeat command to the server 400, the server 400 may update the corresponding device status to an offline status.
  • the device state of the terminal device 200 may also be triggered to be reported through a voice command. That is, the server 400 may acquire the voice and audio data corresponding to the voice command, and recognize the wake-up word from the voice and audio data. If the voice and audio data includes the wake-up word, locate the voice control system where the terminal device 200 is located, so as to send a status acquisition request to the voice control system. All terminal devices 200 in the voice control system may report the device status after receiving the status acquisition instruction.
  • the server 400 can recognize the wake-up word "Hi! Xiao ⁇ ” from the voice and audio data, then after recognizing the wake-up word "Hi! Xiao ⁇ ” in the voice and audio data, the server 400
  • the voice control system currently used by the user can be determined according to the identification information of the terminal device 200, that is, " ⁇ 's home system".
  • the voice control system has a smart TV, speaker A, and speaker B in the living room; there are lights and speaker C in the bedroom. ; There is a smart refrigerator in the kitchen. Then send a status acquisition request to the voice control system, so that the TV, speaker A, speaker B, lamp, speaker C, and smart refrigerator in the voice control system report the current device status.
  • the server 400 may screen the second terminal device according to the service requirement information and the device status information.
  • the second terminal device is an intelligent device whose device status can realize service requirement information.
  • the server 400 can filter the current voice control system according to different preconditions in the process of screening the second terminal device.
  • the terminal device 200 performs multi-level screening. For example, if the user enters the voice "Hi! Little ⁇ , turn on the light", the corresponding business requirement is "turn on the light”.
  • the preconditions required to realize the business requirement are: the device type is a light fixture, and the device status is standby , the server 400 can first filter out all terminal devices 200 whose type is a lamp in the current voice control system according to the device type, and then filter out lamps whose device status is in a standby state according to the device status as the second terminal device.
  • the server 400 After the second terminal device is screened out, the server 400 sends a response command to the terminal device 200 as the second terminal device, and the terminal device 200 as the second terminal device can respond to the voice control function by running the response command. At the same time, the server 400 also sends a silent command to other terminal devices 200 other than the second terminal device in the current voice control system, so that other smart devices in the current voice control system other than the second terminal device can run the silent command without responding to the voice. control function.
  • the terminal device 200 supporting voice interaction in the home environment will report the received voice command and device status to the server 400, that is, the voice command "Turn on the light " and the device status (standby) are reported to the server 400.
  • the server 400 can determine that the current device status of a lamp meets the object category corresponding to the service requirement in the current user voice command. Therefore, the server 400 can issue a response command for waking up the lamp, and at the same time issue a silent command to other devices, so that the device side executes the corresponding command, so that the lamps that meet the business requirements and device status are turned on, but do not meet the business requirements. and other terminal devices 200 in the device state remain silent.
  • the voice control method provided in the above embodiment can use the service requirement information contained in the voice command and the device status reported by the terminal device 200 to screen out the second terminal capable of responding to the voice command in the current voice control system equipment. And send a response command to the second terminal device, and send a silent command to other devices at the same time, so that after the intelligent voice system receives the voice command input by the user, each terminal device 200 will exchange information through communication with the server 400 respectively,
  • the second terminal device is automatically judged by the server 400 to reduce data interaction between multiple terminal devices 200, so as to alleviate the problem of low execution rate caused by frequent communication between multiple devices.
  • the device When the user inputs the voice, the device can be explicitly executed in the voice instruction. For example, if the voice content input by the user is "turn on the TV", the executing device is specified as the TV. At this time, since there is a clear executing device, the server 400 can The voice command is directly transmitted to the TV device, and the execution device can be determined without parsing the business requirement information. Therefore, in some embodiments, after the server 400 receives the voice command reported by the terminal device 200-1, it may also detect the executing device in the voice command. If there is no specific execution device in the voice instruction, the second terminal device is screened by parsing the service requirement information and matching with the device status according to the manner provided in the above-mentioned embodiment.
  • the control command and feedback voice information can be generated according to the voice instruction.
  • the control command is a command corresponding to the voice command and oriented to the execution device.
  • the corresponding generated control command is "TV_power on”.
  • Feedback voice information is a kind of voice audio sent out for the voice content, which is used to remind the user of the execution result of the instruction. For example, when the user enters the voice "Turn on the TV", the intelligent voice system will play the feedback voice information of "Turn on the TV for you" after turning on the TV.
  • the control command and the feedback voice information can be sent to the specific terminal device 200, so as to implement the service corresponding to the control command by executing the control command, and prompt the user of the service execution result by playing the feedback voice information.
  • Both the control command and the feedback voice information can act on the execution device. For example, when the user enters the voice of "turn on the TV", the TV responds to the voice to power on and start up, and at the same time, "I have been asked to turn on the TV” is played through the TV's intelligent voice system and speakers. ” voice feedback.
  • the server 400 may send control commands and feedback voice information to different terminal devices 200 respectively. That is, the server 400 may send the control command to the execution device according to the identification information of the execution device, and send the feedback voice information to the smart device that inputs the voice command.
  • the intelligent air conditioner with the intelligent voice system in the bedroom first detects the voice audio data, and generates a voice command and sends it to the server 400, and the server 400 can determine that the TV is turned on according to the voice command. device, and generate the "TV_power on” control command and the feedback voice information of "Turn on the TV for you” according to the voice command. Then send the control command to the TV in the living room to turn on the TV, and send the feedback voice information to the smart air conditioner to play the voice feedback of "Turn on the TV for you” through the smart air conditioner in the bedroom.
  • the voice command when there is a clear execution device in the voice command, the voice command can be responded to by the execution device and the terminal device 200 that inputs the voice command, so as to meet the business needs and give the user better feedback effect.
  • the voice control system may contain multiple terminal devices 200, and different terminal devices 200 can support the same business needs and be in the same device state at the same time, the devices are screened through the methods in the above-mentioned embodiments , multiple second terminal devices may be screened out.
  • the server 400 directly sends a response command to the terminal device 200 as the second terminal device, it will cause multiple terminal devices 200 to respond to a voice command at the same time, and there is still the problem of scene confusion.
  • the server 400 may further perform a detailed screening process by adding screening conditions, so as to reduce the number of terminal devices 200 serving as the second terminal devices. That is, in some embodiments, the service requirement information may further include service type and service status. Then, when the server 400 screens the second terminal device according to the service requirement information, it may extract the service type and service state from the service requirement information, and match the candidate device meeting the service type in the device state, wherein the candidate device has a service The device type required by the type, and then by traversing the device states of the candidate devices, to filter out the second terminal device whose device state conforms to the service state.
  • the terminal device 200 in the home environment will report the received voice command, the current device type and device status to the cloud server 400, that is, the device type (music) and Device status (playing).
  • the server 400 can filter the corresponding device type and device status in the current voice control system according to the service type and service status required in the voice command, and determine that there is a speaker that is playing music.
  • the current device type and device status conform to the object category of the current user's voice command. Therefore, the server 400 can send a response command to the corresponding speaker, and send a silence command to other devices at the same time, so that the speaker device in the current voice control system executes the corresponding response command and performs the operation of turning off the music.
  • the service requirement information further includes a service execution location
  • the server 400 may further screen the terminal device 200 according to the service execution location to determine the second terminal device. That is, the server 400 can extract the service execution location from the service requirement information when screening the second terminal device according to the service requirement information, and obtain the device locations of each candidate device in the current voice control system; if the device location of the candidate device is consistent with the service execution location coincidence, that is, the candidate device satisfies the service execution position, then the step of traversing the device states of the candidate devices may be performed to screen out the second terminal device whose device state meets the service state. If the device location of the candidate device does not coincide with the service execution location, it is marked that the candidate device is not the second terminal device, that is, the device can be deleted from the candidate device list.
  • the terminal device 200 in the home environment will receive the user instruction and report the current device type and device status to the cloud server, that is, the device type (none) and device status (standby).
  • the server 400 can analyze the service execution location "bedroom” from the voice command, and screen the terminal device 200 in the current voice control system according to the service execution location, and determine that the device location is in the bedroom The terminal device 200 within the range. Therefore, the server 400 may issue a response instruction to the speakers in the bedroom when it is determined that a speaker in the bedroom corresponds to the current device type and the device status meets the object category controlled by the current user instruction.
  • the server 400 also sends a silent command to other devices in the current voice control system, including devices in the bedroom and devices outside the bedroom.
  • the voice control method provided in the above embodiment can perform multiple rounds of screening on the terminal devices 200 in the voice control system based on service requirement information such as service type, service status, and service execution location, so as to determine a small number of terminal devices 200.
  • the second terminal device is used to reduce the communication frequency between the terminal devices 200 and improve the execution efficiency of the intelligent voice control process.
  • the server 400 can screen out the second terminal device that can respond to the control instruction from among the many terminal devices 200 .
  • the number of terminal devices 200 as the second terminal devices can be greatly reduced, there are still many terminal devices 200 that can meet the service requirements in the partial screening process, and for the user's voice control process, usually only Specific one or more second terminal devices are required to perform the response.
  • the server 400 may further determine the final execution device from among the screened multiple terminal devices 200 that can meet service requirements. That is, when screening the second terminal device according to the service requirement information, the server 400 may perform the following steps:
  • S202 If the number of terminal devices is equal to 1, that is, there is only one terminal device 200 that can meet the current service demand in the current voice control system, so the server 400 can directly mark the terminal device 200 that can realize the service demand information as the second terminal device.
  • S203 If the number of terminal devices is greater than or equal to 2, search for a master device.
  • the master device is one of multiple terminal devices capable of realizing service requirement information.
  • the main device may perform further interaction with the user to determine the second terminal device that finally responds to the voice instruction.
  • the server 400 may send an inquiry instruction to the master device, so that the master device plays an inquiry voice, wherein the inquiry instruction is multiple rounds of wake-up-free voice interaction instructions. Then receive the confirmation voice command input by the user through the main device, and extract the identification information of the second terminal device from the confirmation voice command, so as to screen the second terminal among multiple smart devices that can realize business demand information according to the identification information of the second terminal device equipment.
  • the server 400 can filter out the terminal device 200 that meets the service requirement according to the service requirement in the voice command.
  • the server 400 may send a response instruction to the speaker A, and send a silence instruction to other terminal devices 200 including the speaker B.
  • the main device may be the terminal device 200 closest to the location of the sound source corresponding to the voice command.
  • the server 400 searches for the main device, it can obtain the voice and audio data of multiple smart devices that can realize the business demand information for voice command detection, and extract the sound energy value from the voice and audio data, and then compare the sound energy value to obtain the sound energy value.
  • the terminal device 200 with the highest energy value thus marking the terminal device 200 with the highest sound energy value as the master device.
  • the reverberation time parameter T60 in a specific scene is determined, that is, the time required for the energy attenuation of 60db at any position is the same, and T60 can be estimated based on the energy ratio of the direct sound and the reverberation sound at the corresponding position, therefore, it can be based on From the beamformed spectrogram and the arrival time difference of the sound source, the energy ratio of the direct sound and the reverberant sound of all terminal devices 200 in the environment to the sound source is calculated, and then the direct energy is calculated. Then, by arranging the direct sound energy of the sound source received by each device, it can be determined that the terminal device 200 closest to the position of the sound source is the master device.
  • the master device may also be determined based on other methods. That is, in some embodiments, the detection process of the distance between the sound source location and the terminal device 200 can also be completed by each terminal device 200, that is, the terminal device 200 can acquire images of the current environment through multi-eye cameras, and according to multiple A three-dimensional space model is constructed from images from different angles, and then a portrait is extracted from the three-dimensional space model according to the image recognition method, so as to locate the position of the user in the three-dimensional space model, that is, the position of the sound source.
  • the terminal device 200 After locating the position of the sound source, the terminal device 200 determines the distance between the position of the sound source and each terminal device 200 according to the placement status of the current smart home model, and finally sends the calculated distance to the server 400, so that the server 400 can It is determined that the terminal device 200 closest to the sound source is the master device.
  • the server 400 can further select among multiple terminal devices 200 that can finally execute The second terminal device that responds to the voice control, so that before the voice control process, there will be no frequent communication between multiple devices, and the response speed of the voice interaction process will be improved.
  • the server 400 can determine the second terminal device, and by issuing a response instruction, the second terminal device can make an interactive response to the voice input by the user. Since the interactive response process can control the second terminal device to perform specific interactive actions, these interactive actions may change the device status of the terminal device 200, so after sending the response command to the second terminal device, the server 400 can also obtain the second terminal device. The device state of the terminal device after executing the response command, so as to update the stored device state in real time.
  • the server 400 may receive the execution result data reported by the second terminal device after sending a response instruction to the second terminal device.
  • the execution result data includes the new state of the device after running the response instruction. Then extract the new state of the device from the execution result, and use the new state of the device to update the state of the device stored in the storage module.
  • the device state stored in the server 400 can be kept consistent with the actual device state of the terminal device 200 in the voice control system in time, so that the server 400, in the subsequent execution of the intelligent voice interaction process,
  • the terminal device 200 can be screened based on the updated device state, so as to more accurately determine the second terminal device.
  • a server 400 including: a storage module 410 , a communication module 420 and a control module 430 .
  • the control module 430 is configured to perform the following program steps:
  • S303 Screen the second terminal device according to the service requirement information, where the second terminal device is an intelligent device whose device status can realize the service requirement information.
  • S305 Send a silent command to other smart devices other than the second terminal device in the current voice control system.
  • a terminal device 200 is also provided in some embodiments of the present application, including: an audio input device, an audio output device, a communicator 220 and a controller 250 .
  • the controller 250 is configured to perform the following program steps:
  • S401 Acquire voice and audio data input by a user for performing voice control.
  • S402 Generate a voice instruction according to the voice audio data.
  • S403 Send a voice instruction to the server, so that the server parses the service requirement information in the voice instruction, and screens a second terminal device according to the service requirement information, and the second terminal device is an intelligent device whose device status can realize the service requirement information.
  • S404 Receive a response instruction or a silent instruction sent by the server.
  • the server 400 and the terminal device 200 provided in the above embodiment can form a voice control system for implementing the above voice control method.
  • the server 400 may analyze the service requirement information from the voice command after the user inputs the voice command, and screen the second terminal device whose current device status can realize the service requirement according to the service requirement information, so as to send a response command to the second terminal device , so that the smart device as the second terminal device makes a voice response; at the same time, the server 400 also sends a silent instruction to other devices in the current voice control system other than the second terminal device according to the screening result of the second terminal device, so that the The terminal device 200 that is not the second terminal device does not respond to the voice control function.
  • the server 400 can pre-process voice commands, so that all types of terminal devices 200 can quickly and efficiently make correct wake-up responses within a specified time, and solve the problem of abnormal responses in traditional voice wake-up methods.
  • FIG. 18 is an application scenario diagram of a terminal device provided by an embodiment of the present application.
  • terminal devices such as smart TV 200-5, smart air conditioner 200-2, smart refrigerator 200-4, and smart washing machine 200-3 in the home can be connected to smart terminal 300 and server 400 through the Internet of Things module.
  • data transmission can be performed through a local area network or a wide area network, so as to realize the control and management of the terminal equipment.
  • IoT modules can be built into individual end devices.
  • the first terminal device is generally a device with certain functions. For the received voice, if no terminal device is specified, it is considered that the device responds to the voice, and if the terminal device to be operated is specified, a control command is sent to the specified terminal device. Exemplarily, taking a smart speaker as an example, after receiving "fast forward”, first judge whether there is an operation corresponding to the command on the smart speaker, and if so, execute the fast forward operation; if not, then feedback an unrecognized prompt information. If "TV fast forward" is received, the smart speaker first judges whether there is an operation corresponding to the corresponding TV corresponding to the mapping relationship. If it exists, it will send an operation command that triggers the TV fast forward to the TV. If it does not exist, it will feedback the representation Unrecognized hint message.
  • the first terminal device may play a prompt to the user representing several terminal devices that have mappings.
  • the determination of the mapping relationship between the first terminal device and other terminal devices may be performed in a local module or in the cloud.
  • the local module and can be placed in the first terminal device or fixed to each other. They can also be separate objects.
  • FIG. 19 is another application scenario diagram of a terminal device provided in an embodiment of the present application.
  • the user wants to listen to XXX songs, after the user inputs the wake-up word "Hi!
  • the first terminal device activates the voice application installed in the device, and inputs "play" to the first terminal device XXX song "voice message
  • the first terminal device converts the voice message into a voice command through the voice application and transmits it to the server 400
  • the server 400 inquires about the terminal device currently configured by the user after receiving the voice command, and finds that there is a TV, After the refrigerator and air conditioner, the user will be fed back "You have 3 devices, which one do you want to use” through the first terminal device. If the user is in the living room at this time, he may continue to input the voice message of "play with the living room".
  • the server 400 will further determine that there are two terminal devices, a TV and an air conditioner, in the living room at this time, and feed back to the user through the first terminal device "You have 3 devices in the living room, which one do you want to use?" The user needs to input " Let's play it on TV", so far, the server 400 will control the TV to play XXX song.
  • the first terminal device may be a device with a radio function such as a smart remote controller and a smart speaker.
  • the server 400 needs to perform multiple rounds of voice interaction with the user, so that the user can actively make various prompts, and finally determine the second terminal device that will finally execute the command. The above process of multiple interactions with the user.
  • the present application provides a server in some embodiments, and the server is configured to perform a voice interaction process.
  • the voice interaction process will be described below with reference to the accompanying drawings.
  • the method being executed in the server 400 may be executed in another terminal device or the first terminal device.
  • Follow-up uses server execution as an example.
  • FIG. 20 is a schematic flowchart of a voice control method provided by an embodiment of the present application. As shown in Figure 20, the method includes the following steps:
  • the first terminal device and other terminal devices such as TVs, refrigerators, and air conditioners are placed in the user's actual environment. Both the first terminal device and the other terminal devices can be connected to the server through the network, and the user pre-operates on a certain terminal device. These terminal devices are all logged into the same account, and the server side stores the mapping relationship between the user ID of the account and the terminal device. At any moment, the first terminal device receives the voice input by the user.
  • the user may send a voice command to the first terminal device based on his own needs, and control the corresponding second terminal device to execute the voice command by voice. For example, when the user wants to use the second terminal device to watch the XXX movie, he can enter the wake-up word "Hi! XX" into the first terminal device, and then input a voice message of "Play XXX movie" to the first terminal device.
  • the second terminal device and the first terminal device are two independent devices, and the second terminal device can have its own voice receiving device for voice control.
  • the second terminal device and the first terminal device are not TVs and only It is not the relationship between the TV remote control used for the TV, nor the relationship between the TV processor and the voice receiving device on the TV.
  • the wake word and voice command can be entered sequentially.
  • the first terminal device can recognize the voice instruction contained in the sentence containing the wake-up word.
  • the first terminal device can integrate functions of sound collection and speech analysis.
  • the radio function is to receive the voice message sent by the user
  • the voice analysis function refers to extracting the key part of the voice message sent by the user.
  • the key part can reflect the user's intention or the content to be done, and analyze the user's intention Afterwards, it is converted into a voice command, and the voice command may be an executable command format agreed between the first terminal device and the server 400 .
  • the command format may include command (command) and parameters (parameter).
  • the voice commands are sent to the server 400 and received by the server 400 .
  • adding the corresponding user identifier to the voice command facilitates the server 400 to identify which user sent the voice command, and also facilitates the user to search for all terminal devices configured by the user.
  • the first terminal device has an independent audio-visual display system capable of decoding and playing audio and video streams.
  • the second terminal device also has an independent audio and video display system, which can decode and play audio and video streams.
  • S2001 Receive a voice instruction including a user identifier sent by a first terminal device.
  • the first terminal device may use the text converted from the voice as a voice instruction and send it to the server.
  • the voice conversion server is in the server, so the first terminal device can send the received voice to the server according to the agreed encapsulation structure, and the server decapsulates the received data and obtains the voice command.
  • the server 400 analyzes the voice command to obtain the user identifier and the user's intention conveyed by the voice command.
  • the server determines the device type corresponding to the command by analyzing the voice command and according to the keywords in the parsed text, where the device type refers to the functional authority of the terminal device.
  • mapping relationship between keywords and device types may be cached in the server, and in some embodiments, a trained keyword-device type neural network model may also be stored.
  • the server 400 searches the database for all terminal devices related to the user ID, that is, searches for all terminal devices configured by the user. This is because, if device discovery is based on near-field communication, the device can discover all the devices it scans, including its own and/or devices of other users in close proximity, but some of these devices do not belong to the user , even if it is judged by the LAN, it is possible to initially select the scenario where a guest device is connected to the LAN. By pre-establishing a relationship between the user ID and its own equipment in a server with a management function, the server 400 can determine the equipment actually owned by the user according to the user ID.
  • the terminal device refers to other devices associated with the user identifier except the first terminal device.
  • the recognition of the voice command needs to determine the type of the voice command, and then compare it with the types that can be executed by each terminal device, and then determine the terminal device that can execute the voice command.
  • the execution types of each terminal device may be pre-calibrated, for example, the kitchen refrigerator corresponds to freezing, refrigeration, recipe recommendation, and ingredient identification. It may also be determined according to the device identifier when a newly added device is scanned. For example, when the newly added device identifier indicates that the device is a refrigerator, an association between the device identifier and freezing, refrigeration, recipe recommendation, food material identification, etc. is established.
  • the server 400 may also load all terminal devices and device attributes related to the user ID from the database into the cache. Subsequently, when the server 400 searches for the second terminal device, it may directly search in the cache, so that the speed at which the server 400 searches for the second terminal device can be accelerated.
  • the device attributes include inherent attributes such as the location of the terminal device, the name of the terminal device, and the ID of the terminal device.
  • the server 400 if the server 400 does not find a terminal device related to the user identifier, it means that the user has not configured a terminal device. At this time, the server 400 needs to feed back the parameter indicating that there is no terminal device to the first terminal device. The first terminal device broadcasts the absence of the terminal device according to the received parameter representing the absence of the terminal device.
  • S2004 When there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching second terminal device, and feed back parameters characterizing the best matching second terminal device, so that the first terminal The device announces that there is a second terminal device that executes the voice command, and controls the most matching second terminal device to execute the voice command.
  • the preset filtering rule refers to a rule for filtering out a preset rule conforming to the user's intention/a device capable of executing the voice instruction. For example, when the user inputs the voice of "play folk music", the first terminal device sends an instruction to play folk music to the server, and the server recognizes that the type of the user's intention is to play music, and then determines the first terminal that can perform this type of operation according to the type. Two terminal equipment.
  • the terminal device related to the user identifier when there is a terminal device related to the user identifier, the terminal device related to the user identifier is used to execute the voice instruction.
  • the voice instruction is an instruction to play a video
  • the device that cannot perform video playback will be screened out.
  • the voice command can be executed by the cooling device.
  • the server 400 is configured with preset filtering rules.
  • the filtering rule characterizes the mapping relationship between the voice instruction and the terminal device.
  • the filtering rules include the first group of rules and the second group of rules, or only the first group of rules do not include the second group of rules, wherein the first group of rules refers to filtering out the most matching second terminal device
  • the second set of rules refer to the rules that are superimposed and utilized one by one when no best matching second terminal device is screened out.
  • the first group of rules includes the mapping relationship between device types and each terminal device, for example, the mapping relationship shown in Table 1 in the above embodiments.
  • the first set of rules may be a judgment on the number corresponding to user identifiers.
  • the second rule needs to be further used for further screening.
  • the optional second terminal device is determined according to the device type and the mapping relationship of each device, if the number of the second terminal device is 1, the voice instruction is directly sent to the second terminal device, if 0, then there is no executable device to feed back to the first terminal device. If it is more than 1, it can be based on location/installation time/usage frequency/device that executed this type of command last time/start time/signal strength/priority, etc. One or any combination of them can be used as the second set of rules for filtering.
  • FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application.
  • the process of screening the second terminal device according to the filtering rules is as follows:
  • S2101 Use the first set of rules to screen the current terminal device and the terminal device related to the user identifier.
  • the server 400 when there is a terminal device related to the user identifier, the server 400 first screens the terminal device according to the necessary rules for screening out the most matching second terminal device.
  • the first group of rules includes terminal device function authority sub-rules, where the terminal device function authority sub-rules refer to the respective functions of each terminal device, such as a smart TV having the function of playing media assets,
  • the air conditioner has the function of cooling and heating and so on. That is, when the server 400 screens out the terminal devices actually owned by the user, it further uses the terminal device function authority sub-rule to screen the terminal devices, and excludes the terminal devices that do not have corresponding authority functions.
  • the server 400 when terminal devices are screened by using the sub-rules of terminal device function rights, the server 400 respectively detects function rights of multiple terminal devices related to the user identifier.
  • the server 400 selects the corresponding terminal device as the candidate terminal device.
  • the server 400 excludes the corresponding terminal device.
  • the server 400 when using the first set of rules to filter terminal devices related to the user identifier, if none of the terminal devices actually owned by the user has the corresponding functional authority, the server 400 needs to feed back the token to the first terminal device. There are no parameters for end devices. The first terminal device broadcasts that there is no terminal device according to the received parameter representing the absence of the terminal device.
  • the server 400 when using the first set of rules to filter terminal devices related to the user identifier, if the terminal device actually owned by the user has the corresponding functional authority, the server 400 also needs to confirm that the terminal device with the corresponding functional authority quantity.
  • the server 400 when the server 400 detects that there is only one terminal device with the authority to execute the voice command, it feeds back the parameters representing the terminal device with the corresponding authority to the first terminal device, and the first terminal device according to The parameter representing the terminal device with the corresponding authority broadcasts the terminal device with the corresponding authority.
  • the server 400 also needs to control the current terminal device to perform corresponding operations according to the user's intention in the voice command.
  • the server 400 sends the voice instruction to the current terminal device, and the current terminal device receives the voice instruction and performs a corresponding operation according to the voice instruction.
  • the user is currently only equipped with a sweeping robot. If the user inputs a voice message of "sweeping the floor", the first terminal device transmits the voice message to the server 400. After querying, the server 400 finds that the current user is only equipped with a sweeping robot. With cleaning function. Therefore, the server 400 controls the first terminal device to broadcast "the sweeping robot starts to sweep the floor", and controls the sweeping robot to perform the sweeping function. Wherein, when the server 400 controls the broadcasting of the first terminal device, it may feed back the parameter representing the terminal device with the corresponding authority to the first terminal device through a long link.
  • the sub-rules in the second set of rules are used to screen the multiple terminal devices that have the authority to execute the voice command one by one in order of priority from high to low.
  • the second set of rules preset in the server 400 includes a user frequency sub-rule, a distance from the first terminal device sub-rule, and a second terminal device priority sub-rule.
  • the user frequency sub-rule refers to the number of times the user has executed similar voice commands through the second terminal device, for example, the number of times the user plays video data such as A movie and B variety show through a smart TV, or the user plays C through a smart speaker.
  • the frequency of audio data such as songs and D songs. Among them, A movie, B variety show, C song, D song, etc. can be divided into media data.
  • the distance sub-rule from the first terminal device refers to the distance between each second terminal device and the first terminal device.
  • the second terminal device priority sub-rule refers to the priority of each terminal device set by the user when executing voice commands. For example, corresponding to the voice commands for playing media assets, the priority of smart TVs is higher than that of smart refrigerator screens. Wherein, the priority of the terminal device can be set by the user through the application program on the smart terminal 300 .
  • multiple sub-rules in the above-mentioned second group of rules can be set with corresponding priorities by the user, for example, the priority of setting the priority of the user usage frequency sub-rule is higher than the priority of the distance from the first terminal device sub-rule , set the priority of the terminal device priority sub-rule to be higher than that of the user usage frequency sub-rule, etc.
  • the server 400 uses filtering rules to filter terminal devices to select one of the most suitable second terminal devices for executing the current voice command, that is, to filter to the end, the most matching second terminal device
  • the number of terminal devices is 1. After screening by the first set of rules, as long as the number of corresponding terminal devices is not unique, the server 400 will use the sub-rules in the second set of rules to screen the remaining terminal devices one by one according to the priority.
  • the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart washing machine. After filtering, it is obtained that the smart TV, smart speaker, and smart refrigerator have playback functions, and the smart TV, smart speaker, and smart refrigerator need to be further screened through the second set of rules.
  • the server 400 further filters through the distance sub-rule with the first terminal device, and through the device location information in the device attributes of the terminal device, the server 400 judges that the smart TV, the smart speaker and the first terminal device are all in the living room, that is, the distance is relatively small. The distance between the smart refrigerator and the first terminal device is relatively long, while the smart refrigerator is in the kitchen.
  • the smart refrigerator is excluded. Since there are still two candidate terminal devices, i.e. smart TVs and smart speakers, which need to be further screened, the server 400 continues to filter through the terminal device priority sub-rules. priority is higher than that of the smart speaker, so the server 400 selects the smart TV as the most matching terminal device.
  • the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart electric fan. After filtering by the device, it is obtained that the smart air conditioner and the smart electric fan have room temperature adjustment functions, and the server 400 needs to further screen the smart air conditioner and the smart electric fan through the second set of rules. The server 400 continues to filter through the terminal device priority sub-rules. For voice commands such as room temperature adjustment, the priority of the smart air conditioner set by the user is higher than that of the smart electric fan, so the server 400 selects the smart air conditioner as the most matching terminal equipment. Of course, users can also set the priority of smart electric fans higher than that of smart air conditioners according to their own needs.
  • the server 400 when using the user frequency sub-rules to screen terminal devices, the server 400 respectively detects execution frequencies of a plurality of second terminal devices related to the user identifier, wherein the execution frequency refers to The number of times that the second terminal device has executed similar voice commands in historical behaviors.
  • the server 400 will reserve the second terminal device with the highest execution frequency.
  • the server 400 when using the sub-rule of distance from the first terminal device to screen terminal devices, the server 400 respectively detects the distances between multiple second terminal devices related to the user identifier and the first terminal device. The server 400 will reserve the second terminal device with the closest distance to the first terminal device.
  • the server 400 when screening terminal devices using the terminal device priorities, reserves the second terminal device with the highest priority according to the terminal device priorities set by the user.
  • the server when the server responds to the voice command issued by the user and inquires that there are currently multiple terminal devices, it can select the second terminal device that will finally execute the command based on the filtering rules, so as to avoid multiple communication between the user and the first terminal device. Voice interaction process to improve user experience.
  • the present application also provides a voice control method in some embodiments, the method includes: the server 400 receives a voice instruction including a user ID sent by a first terminal device, and searches for all terminal devices related to the user ID. When there is no terminal device related to the user identifier, the server 400 feeds back a parameter characterizing the absence of a terminal device, so that the first terminal device broadcasts that there is no terminal device executing the voice command. When there is a terminal device related to the user identifier, the server 400 uses preset filtering rules to filter out the most matching terminal device, and feeds back parameters representing the most matching terminal device, so that the first terminal device broadcasts the most matching terminal equipment, and control the most matching terminal equipment to execute the voice instruction.
  • FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application.
  • the terminal devices in the smart home scenario include terminal device 200-4 (that is, a smart refrigerator), terminal device 200-3 (that is, a smart washing machine), and terminal device 200-5 (that is, a smart display device).
  • terminal device 200-4 that is, a smart refrigerator
  • terminal device 200-3 that is, a smart washing machine
  • terminal device 200-5 that is, a smart display device.
  • the recording application in the first terminal equipment that is, the intelligent terminal 200-1
  • the control intention does not include the second terminal device to be controlled.
  • the intelligent terminal 200-1 sends the recorded data of the user to the server 400, so that the server 400 can recognize the voice command and obtain the specific control information corresponding to the voice command, so as to determine the first control information that the user actually wants to control according to the control information.
  • Two terminal devices, and directly control the second terminal device to execute the corresponding control instructions that is: through the interaction between the server 400 and the intelligent terminal 200-1, the voice control of the terminal device is realized; or the user can use the local server 400, such as the object
  • the recording module in the networked terminal records sound, wherein the sound is mainly the user's control intention, and the control intention does not include the second terminal device to be controlled.
  • the local server 400 recognizes the voice command entered by the user to obtain specific control information corresponding to the voice command, thereby determining the second terminal device that the user actually wants to control according to the control information, and directly controls the second terminal device to execute corresponding control commands.
  • the terminal device can be automatically controlled by voice to execute corresponding control instructions, which is convenient for the user to control and use the smart home device, and is beneficial to Improve intelligence and accuracy.
  • a smart home scene may include various terminal devices, and FIG. 22 is only an illustration, and does not specifically limit the type and number of smart devices.
  • the voice control method provided in the embodiment of the present application may be implemented based on a computer device, or a functional module or a functional entity in the computer device.
  • the computer may be a personal computer (personal computer, PC), a server, a mobile phone, a tablet computer, a notebook computer, a mainframe computer, etc., which are not specifically limited in this embodiment of the present application.
  • the voice control method provided in the embodiment of the present application may be implemented based on the above computer device.
  • the voice control process provided by the embodiment of the present application can be implemented based on the above-mentioned computer equipment.
  • This method can recognize the voice command of the user, and obtain the control information corresponding to the voice command.
  • the control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first candidate terminal device set corresponding to the function category; then based on the functional state corresponding to each candidate terminal device in the first candidate terminal device set, determine the second candidate terminal device set that matches the control instruction, Finally, determine the second terminal device that matches the control command from the second candidate terminal device set, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for users to control smart home devices And use, help to improve intelligence and accuracy.
  • FIG. 23A In order to describe this solution in more detail, the following will be described in conjunction with FIG. 23A in an exemplary manner. It can be understood that the steps involved in FIG. 23A may include more steps or fewer steps in actual implementation, And the order of these steps may also be different, as long as the voice control method provided in the embodiment of the present application can be realized.
  • FIG. 23A is a schematic flowchart of a voice control method provided in an embodiment of the present application
  • FIG. 23B is a schematic diagram of a principle of a voice control method provided in an embodiment of the present application.
  • This embodiment is applicable to the situation of controlling each terminal device included in the smart home scene.
  • the method of this embodiment can be executed by a voice control device, which can be implemented in hardware/or software, and can be configured in computer equipment.
  • the method specifically includes the following steps:
  • control information includes a function category and a control command.
  • the voice command can be understood as the data formed after the user records.
  • the control information can be understood as the control intention corresponding to the user's voice command, which includes the function category and control instructions related to the terminal device, but does not include the specific second terminal device to be controlled.
  • Terminal devices can be understood as various devices included in smart home scenarios, such as audio and video equipment, lighting systems, curtain control, air conditioning control, digital theater systems, audio and video servers, and network appliances.
  • the function category can be understood as the category to which the specific functions of the terminal device belong.
  • the category corresponding to the smart TV may include: volume, brightness, video playback scene, and recipe scene.
  • the control instruction can be understood as an operation instruction related to the terminal device, such as opening, closing, playing, and pausing.
  • each terminal device is in a different control state, and the user needs to specify the specific control state of each terminal device to control the terminal device.
  • voice commands that do not specify a terminal device, it may It will lead to execution failure during the voice control process or the need to guide the user to supplement information multiple times in order to determine the terminal device to be controlled.
  • the execution subject in this embodiment may be a local control device 200 with processing and interaction functions, such as an Internet of Things terminal, or a server 400 that interacts with the smart terminal 300 .
  • a local control device 200 with processing and interaction functions such as an Internet of Things terminal, or a server 400 that interacts with the smart terminal 300 .
  • the local control device 200 and the server 400 cannot directly obtain the specific information contained in the voice command, it is necessary to recognize the user's voice command, specifically through the voice recognition method and the semantic understanding method. , can also be recognized by a method such as a neural network model or a speech recognition system, which is not specifically limited in this embodiment. After the recognition, the control information corresponding to the voice command can be obtained.
  • S2302. Determine a first set of candidate terminal devices corresponding to the function category according to the pre-established terminal device information table.
  • the terminal device information table can be understood as a pre-established table related to the information corresponding to each terminal device in the smart home scene, and the table can include the device identification number, device name, function category and function status of each terminal device, etc. .
  • the first set of candidate terminal devices can be understood as a set of terminal devices included in the smart home scene that match the function category.
  • the first candidate terminal device corresponding to the function category in the control information can be obtained gather.
  • the second set of candidate terminal devices can be understood as a set of second terminal devices included in the smart home scene that match the control instructions, and this set is a candidate for the second terminal device that the user wants to control. gather.
  • the first candidate terminal device set may contain multiple candidate terminal devices, and each candidate terminal device may be in a different functional state. Therefore, after obtaining the first candidate terminal device set, in order to determine the terminal device that the user wants to control, further steps are required. narrow down. At this time, according to the functional status corresponding to each candidate terminal device in the first candidate terminal device set, the functional status corresponding to each candidate terminal device is compared with the control instruction in the control information, and the candidate that matches the control instruction can be obtained. A second set of candidate terminal devices formed by the terminal devices.
  • the functional state corresponding to candidate terminal device 1 is normal; the functional state corresponding to candidate terminal device 2 is playing, and the functional state corresponding to candidate terminal device 3 is off, then the candidate terminal device 3 Add to the second set of candidate terminal devices.
  • the second terminal device may be understood as a terminal device that matches the control instruction.
  • the second terminal device Since there may be multiple terminal devices included in the second candidate terminal device set, it is necessary to determine the second terminal device that matches the control instruction from the second candidate terminal device set, and the number of second terminal devices may be multiple , may depend on specific circumstances, and this application does not make specific limitations. After the second terminal device is determined, a corresponding control command is sent to the terminal device to control the second terminal device to execute the control command, so as to meet the needs of the user and accurately execute the control of the second terminal device in accordance with the user's voice command.
  • FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application, as shown in FIG. 23C:
  • each terminal device is Dev1, Dev2, Dev3, ...;
  • the total set of functional categories is defined as F, and each function is F1, F2, F3, ...;
  • the total set of functional states is defined as S, and the functional states are respectively S1, S2, S3, . . .
  • step 4 Query the sets in step 2, and determine the second set of candidate terminal devices according to the terminal devices corresponding to the sets whose elements are the same as FxSy.
  • the user's voice command is firstly recognized to obtain the control information corresponding to the voice command.
  • the control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first A set of candidate terminal devices; then, based on the functional state corresponding to each candidate terminal device in the first set of candidate terminal devices, determine a second set of candidate terminal devices that matches the control instruction, and finally determine from the second set of candidate terminal devices that matches the control command Match the second terminal device, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for the user to control and use the smart home device, and is conducive to improving intelligence and accuracy.
  • the terminal device information table is obtained in the following manner:
  • the preset scene can be understood as a scene that includes multiple terminal devices and the multiple terminal devices are interconnected through a network, such as a smart home scene, a smart office scene, and the like.
  • the device name, function name, function category, and function status corresponding to each terminal device included in the preset scene can be obtained through the reporting of terminal device information, and the corresponding device names of each terminal device can also be obtained in other ways. Name, feature name, feature category, and feature status. After obtaining the information corresponding to each terminal device, according to all the device names, function names, function categories and function states, it is possible to establish a corresponding terminal device information table, or in the device name, function name, function category and function status.
  • the terminal device information table can be updated in time after at least one of the changes occurs.
  • the terminal device information table is consistent with the actual functional status of each terminal device, thereby facilitating the determination of the first set of candidate terminal devices and ensuring the accuracy of the set. accuracy.
  • the method further includes:
  • first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, then send second prompt information, wherein the The second prompt information is used to instruct the user to determine the second terminal device from multiple terminal devices;
  • the second response information includes second identification information corresponding to the second terminal device
  • the first set of candidate terminal devices being an empty set may be understood as that there is no candidate terminal device meeting the conditions in the set. It may be understood that the second candidate terminal device set is an empty set, which means that there is no qualified terminal device in the set.
  • the second prompt information can be sent, for example, the local control device 200 can send the second prompt information, for example, it can send the second prompt information to its own display screen or audio application to display or play the second prompt information to instruct the user Determine the second terminal device from the multiple terminal devices; or the server 400 sends second prompt information to the smart terminal 300 to instruct the user to determine the second terminal device from the multiple terminal devices.
  • the second response information fed back by the user is received. Since the second response information includes the second identification information corresponding to the second terminal device, the second terminal device corresponding to the second identification information can be directly controlled to execute the control instruction.
  • the second terminal device can be determined through the above method, so as to meet the control requirement of the user and improve the user experience.
  • FIG. 24A is a schematic flow chart of another voice control method provided by the embodiment of the present application
  • FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • This embodiment is further expanded and optimized on the basis of the foregoing embodiments.
  • a possible implementation of S2304 in this embodiment is as follows:
  • the second candidate terminal device set may contain multiple second candidate terminal devices, in order to determine the second terminal device that matches the control instruction, it is necessary to determine all the second candidate terminal devices that are included in the second candidate terminal device set that match the control instruction.
  • S23042 Determine the second terminal device from the second candidate terminal device set according to the relationship between the quantity and the preset threshold, and control the second terminal device to execute the control instruction.
  • the preset threshold may be a preset value, such as 1, 3, etc., and may also be determined according to specific circumstances, which is not specifically limited in this embodiment.
  • the second terminal device is determined from the terminal device set, for example, all the second candidate terminal devices included in the second candidate terminal device set are second terminal devices, or part of the second candidate terminal devices included in the second candidate terminal device set
  • the terminal device is the second terminal device. After the second terminal device is determined, it is necessary to control the second terminal device to execute the control instruction, so as to implement smart home control through voice and reduce user operations.
  • determining the second terminal device through the above method is simple and quick, and can improve work efficiency.
  • FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application
  • FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • This embodiment is further expanded and optimized on the basis of the foregoing embodiments.
  • a possible implementation of S23042 in this embodiment is as follows:
  • the number of second candidate terminal devices is less than or equal to the preset threshold, it means that the number of second candidate terminal devices does not exceed the upper limit. Therefore, all second candidate terminal devices included in the second candidate terminal device set are determined as the second Terminal Equipment.
  • a control instruction needs to be sent to each second terminal device, so as to control each second terminal device to respectively execute the control instruction.
  • the local control device 200 sends the first prompt information, for example, it can send the first prompt information to its own display screen or audio application, so as to display or play the first prompt information, to instruct the user to send the first prompt information from multiple terminals
  • the second terminal device is determined in the device; or the server 400 sends the first prompt information to the smart terminal device 204 to instruct the user to determine the second terminal device from multiple terminal devices.
  • the first response information fed back by the user is received, so as to subsequently control the second terminal device corresponding to the first identification information to execute the control instruction.
  • the first response information includes the first identification information corresponding to the second terminal device, it is possible to directly control the second terminal device corresponding to the first identification information to execute the control instruction.
  • FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application
  • FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • This embodiment is further expanded and optimized on the basis of the foregoing embodiments.
  • a possible implementation of S2301 in this embodiment is as follows:
  • the speech recognition method is a method for converting speech into text, such as speech recognition software.
  • the speech recognition method can perform text recognition on the speech instruction, so as to obtain the text information corresponding to the speech instruction.
  • S23012 perform semantic understanding on the text information by using a semantic understanding method, and obtain control information contained in the text information, where the control information includes function categories and control instructions.
  • the semantic understanding method may include a keyword extraction method, an information extraction method, and the like.
  • the text information recognized by the machine may contain redundant information, repeated information, etc., in order to further improve the accuracy of the recognition process, the text information is semantically understood through the semantic understanding method, and the control information contained in the text information is obtained.
  • Information the control information includes function categories and control instructions.
  • control information obtained through the above method is more accurate and more in line with the actual situation, which is beneficial to ensure the smooth progress of the subsequent process.
  • FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application, as shown in FIG. 26C:
  • voice recognition is performed on the voice command to obtain the first information
  • semantic understanding of the first information is performed to obtain the control information.
  • FIG. 27A is a schematic structural diagram of a local control device in the embodiment of the present application, as shown in FIG. 27A:
  • the local control device 200 includes voice recognition service, semantic understanding service and terminal device control service.
  • the speech recognition service is mainly used for recording and recognizing the user's voice command to obtain the recognition result
  • the semantic understanding service is mainly used for determining the control information according to the recognition result
  • the home control service is used for maintaining the terminal device information table and receiving the terminal device report device information and control the corresponding terminal device according to the control information.
  • FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in the embodiment of the present application, as shown in FIG. 27B:
  • the voice recognition service includes a recording module and a recognition engine, wherein the recording module is used for recording, and the recognition engine is used for recognition according to the user's voice command to obtain a recognition result.
  • Semantic understanding services include functional categories and control instructions.
  • the home control service includes a terminal device information table, determining a second terminal device, and voice command control.
  • the home control service interacts with each terminal device, such as terminal device A, terminal device B, ..., terminal device N, and the home control service obtains the terminal device information table according to the device information reported by each terminal device; according to the terminal device information table and the functional category to determine the first set of candidate terminal devices; determine the second set of candidate terminal devices according to the second set of candidate terminal devices and the control instruction, and determine the second set of terminal devices from the second set of candidate terminal devices, and control the second terminal device Execute the control command, so as to realize the control of the smart home through the voice command.
  • the terminal device is responsible for reporting device information and receiving and executing control instructions issued by the home control service.
  • terminal device information table is as shown in Table 2 below:
  • the brightness and volume of the TV are normal and the menu UI is being displayed;
  • the smart speaker is playing music
  • the refrigerator door is open.
  • each device included in Table 1 is a terminal device.
  • Example 1 Suppose the voice command is "too dark", the function category is brightness, the control command is increase, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 1 Curtain and 2 TV; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 1 curtain, because all the second candidate terminal devices contained in the second candidate terminal device set If the number is equal to the preset threshold, it is determined that the second terminal device is a curtain, and the curtain is controlled to perform an opening function.
  • Example 2 Assume that the voice command is "I want to grill a steak", the functional category is ingredient scene and recipe scene, the control command is cooking and query, and the preset threshold is 1, then the first candidate terminal device can be determined according to the functional category and Table 1 The set is: 4 televisions and 6 ovens; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 6 ovens, because the second candidate terminal device set contains The number of all second candidate terminal devices is equal to the preset threshold, then it is determined that the second terminal devices are 6 ovens, and the ovens are controlled to perform the function of grilling steaks.
  • the TV will introduce the grilled steak recipe.
  • Example 3 Assuming that the voice command is "too loud", the function category is volume, the control command is lower, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 3 TVs and 5 smart speakers; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 5 smart speakers, because all the second candidate terminal devices contained in the second candidate terminal device set If the number of terminal devices is equal to the preset threshold, it is determined that the second terminal device is 5 smart speakers, and the smart speakers are controlled to perform a volume down function.
  • the user is prompted to select whether the device to turn down the volume is a TV or a sound box.
  • Example 4 Suppose the voice command is "close the door", the function category is door, the control command is close, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 7 refrigerators and 8 televisions ; Then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 7 refrigerators, due to the number of all second candidate terminal devices contained in the second candidate terminal device set is equal to the preset threshold, then it is determined that the second terminal device is a 7 refrigerator, and the refrigerator is controlled to execute the door closing function.
  • the user is prompted whether the device to close the door is a refrigerator or an oven.
  • the embodiment of the present application provides an electronic device, the electronic device includes at least a processor and a memory, and the processor is used to implement the operation of any one of the above-mentioned terminal devices when executing the computer program stored in the memory. Voice control method.
  • the embodiment of the present application provides a computer-readable non-volatile storage medium, which stores a computer program, and when the computer program is executed by a processor, it realizes voice control of any terminal device as described above. method.

Abstract

The present application discloses a terminal device and a server for voice control. The server of the present embodiment receives a voice signal transmitted by a first terminal device, generates a voice instruction according to the voice signal, and feeds back the voice instruction to the first terminal device. If the first terminal device can execute an operation corresponding to the voice instruction, the corresponding operation is executed in response to the voice instruction. If the first terminal device cannot execute the operation corresponding to the voice instruction, an instruction distribution request is transmitted to the server. The server transmits the voice instruction to a second terminal device according to the instruction distribution request, so that the second terminal device executes the corresponding operation in response to the voice instruction, wherein the second terminal device can execute the operation corresponding to the voice instruction.

Description

一种进行语音控制的终端设备及服务器A terminal device and server for voice control
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年06月22日提交、申请号为202110688867.0;在2021年08月11日提交、申请号为202110917713.4;在2021年12月13日提交、申请号为202111521226.2;在2022年02月18日提交、申请号为202210151526.4的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires submission on June 22, 2021, with application number 202110688867.0; on August 11, 2021, with application number 202110917713.4; on December 13, 2021, with application number 202111521226.2; in February 2022 The priority of the Chinese patent application filed on the 18th with the application number 202210151526.4, the entire content of which is incorporated in this application by reference.
技术领域technical field
本申请涉及语音交互技术领域,尤其涉及一种进行语音控制的终端设备及服务器。The present application relates to the technical field of voice interaction, in particular to a terminal device and a server for voice control.
背景技术Background technique
随着语音交互技术的发展,越来越多的家用终端设备中都具备语音交互功能。利用语音交互功能,用户可语音控制这些终端设备执行相应的操作,例如启动、停止运转等。With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. Using the voice interaction function, the user can voice control these terminal devices to perform corresponding operations, such as starting and stopping.
目前用户语音控制终端设备的过程为,用户输入语音信号,终端设备采集到语音信号后,将语音信号转化为相应的指令,以使得终端根据指令进行相应的操作。At present, the process of the user's voice control of the terminal device is that the user inputs a voice signal, and after the terminal device collects the voice signal, it converts the voice signal into a corresponding instruction, so that the terminal performs corresponding operations according to the instruction.
然而,目前大部分的终端设备的语音交互功能,都受距离的限制。用户无法在室内任意地点操控其想操控的设备。例如,无法在厨房语音控制卧室的智能电视关机或开机,无法在客厅通过语音控制调节卧室空调温度。用户要想操控终端设备,需要移动至有效距离或者提高音量,导致用户使用体验较差。However, at present, the voice interaction functions of most terminal devices are limited by distance. Users cannot control the devices they want to control anywhere in the room. For example, the smart TV in the bedroom cannot be turned off or turned on by voice in the kitchen, and the temperature of the air conditioner in the bedroom cannot be adjusted by voice control in the living room. If the user wants to control the terminal device, he needs to move to an effective distance or increase the volume, resulting in poor user experience.
发明内容Contents of the invention
本实施例提供一种进行语音控制的终端设备及服务器,包括:声音采集器,被配置为采集用户输入的语音信号;控制器,被配置为:从所述声音采集器接收用户输入的语音信号,将所述语音信号发送至服务器,以及从所述服务器接收语音指令,其中,所述语音指令为根据所述语音信号生成的;在所述终端设备可执行所述语音指令对应的操作时,响应于所述语音指令,执行所述语音指令对应的操作;在所述终端设备不可执行所述语音指令对应的操作时,生成指令分发请求,以及将所述指令分发请求发送至所述服务器,以使所述服务器根据所述指令分发请求,查找可执行所述语音指令对应操作的其他终端设备,将所述语音指令发送至其他终端设备。This embodiment provides a terminal device and a server for voice control, including: a voice collector configured to collect a voice signal input by a user; a controller configured to: receive a voice signal input by a user from the voice collector , sending the voice signal to a server, and receiving a voice command from the server, wherein the voice command is generated according to the voice signal; when the terminal device can perform an operation corresponding to the voice command, Responding to the voice instruction, performing an operation corresponding to the voice instruction; when the terminal device cannot perform the operation corresponding to the voice instruction, generating an instruction distribution request, and sending the instruction distribution request to the server, The server is configured to search for other terminal devices capable of performing operations corresponding to the voice command according to the command distribution request, and send the voice command to other terminal devices.
附图说明Description of drawings
图1为本申请实施例提供的一种语音交互原理的示意图;FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application;
图2为本申请实施例提供的一种终端设备的语音控制系统框架示意图;FIG. 2 is a schematic framework diagram of a voice control system of a terminal device provided in an embodiment of the present application;
图3为本申请实施例提供的一种终端设备的语音控制系统场景示意图;FIG. 3 is a schematic diagram of a scenario of a voice control system of a terminal device provided in an embodiment of the present application;
图4为本申请实施例提供的又一种终端设备的语音控制系统场景示意图;FIG. 4 is a schematic diagram of a scene of another voice control system of a terminal device provided in an embodiment of the present application;
图5为本申请实施例提供的又一种终端设备的语音控制系统场景示意图;FIG. 5 is a schematic diagram of a scene of another voice control system of a terminal device provided by an embodiment of the present application;
图6为本申请实施例提供的一种终端设备的语音控制方法信令图;FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application;
图7为本申请实施例提供的一种语音控制系统的使用场景图;FIG. 7 is a usage scenario diagram of a voice control system provided by an embodiment of the present application;
图8为本申请实施例提供的一种终端设备的硬件配置图;FIG. 8 is a hardware configuration diagram of a terminal device provided in an embodiment of the present application;
图9为本申请实施例提供的一种语音交互流程示意图;FIG. 9 is a schematic diagram of a voice interaction process provided by an embodiment of the present application;
图10为本申请实施例提供的一种多个终端设备响应语音交互效果示意图;FIG. 10 is a schematic diagram of multiple terminal devices responding to voice interaction effects provided by an embodiment of the present application;
图11为本申请实施例提供的一种多设备语音唤醒方法流程示意图;FIG. 11 is a schematic flowchart of a multi-device voice wake-up method provided by an embodiment of the present application;
图12为本申请实施例提供的一种筛选第二终端设备流程示意图;FIG. 12 is a schematic flow diagram of a screening process for a second terminal device provided in an embodiment of the present application;
图13为本申请实施例提供的一种根据设备数量确定第二终端设备的流程示意图;FIG. 13 is a schematic flowchart of determining a second terminal device according to the number of devices provided by an embodiment of the present application;
图14为本申请实施例提供的一种标记主设备流程示意图;Fig. 14 is a schematic flow diagram of a marking master device provided by the embodiment of the present application;
图15为本申请实施例提供的一种更新设备状态流程示意图;FIG. 15 is a schematic diagram of a process flow for updating device status provided by an embodiment of the present application;
图16为本申请实施例提供的一种多设备语音唤醒方法服务器侧时序流程图;FIG. 16 is a server-side sequence flow chart of a multi-device voice wake-up method provided by an embodiment of the present application;
图17为本申请实施例提供的一种多设备语音唤醒方法终端设备侧时序流程图;FIG. 17 is a sequence flow chart on the terminal device side of a multi-device voice wake-up method provided by an embodiment of the present application;
图18为本申请实施例提供的一种终端设备的应用场景图;FIG. 18 is an application scenario diagram of a terminal device provided in an embodiment of the present application;
图19为本申请实施例提供的一种终端设备的另一应用场景图;FIG. 19 is another application scenario diagram of a terminal device provided by an embodiment of the present application;
图20为本申请实施例提供的一种语音控制方法的流程示意图;FIG. 20 is a schematic flowchart of a voice control method provided in an embodiment of the present application;
图21为本申请实施例提供的一种筛选终端设备的流程示意图;FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application;
图22为本申请实施例提供的一种语音控制过程的场景示意图;FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application;
图23A为本申请实施例提供的一种语音控制方法的流程示意图;FIG. 23A is a schematic flowchart of a voice control method provided by an embodiment of the present application;
图23B为本申请实施例提供的一种语音控制方法的原理示意图;FIG. 23B is a schematic diagram of the principle of a voice control method provided in the embodiment of the present application;
图23C为本申请实施例中确定第二候选终端设备集合的过程的示意图;FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application;
图24A为本申请实施例提供的另一种终端家居控制方法的流程示意图;FIG. 24A is a schematic flowchart of another terminal home control method provided by the embodiment of the present application;
图24B为本申请实施例提供的另一种语音控制方法的原理示意图;FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application;
图25A为本申请实施例提供的又一种语音控制方法的流程示意图;FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application;
图25B为本申请实施例提供的又一种语音控制方法的原理示意图;FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application;
图26A为本申请实施例提供的又一种语音控制方法的流程示意图;FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application;
图26B为本申请实施例提供的又一种语音控制方法的原理示意图;FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application;
图26C为本申请实施例中得到控制信息的过程的示意图;FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application;
图27A为本申请实施例中一种本地控制设备的结构示意图;FIG. 27A is a schematic structural diagram of a local control device in an embodiment of the present application;
图27B为本申请实施例中一种本地控制设备与终端设备进行交互的结构示意图。FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in an embodiment of the present application.
具体实施方式detailed description
为使本申请的目的和实施方式更加清楚,下面将结合本申请示例性实施例中的附图,对本申请示例性实施方式进行清楚、完整地描述,显然,描述的示例性实施例仅是本申请一部分实施例,而不是全部的实施例。In order to make the purpose and implementation of the application clearer, the following will clearly and completely describe the exemplary implementation of the application in conjunction with the accompanying drawings in the exemplary embodiment of the application. Obviously, the described exemplary embodiment is only the present application. Claim some of the examples, not all of them.
需要说明的是,本申请中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本申请的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。It should be noted that the brief description of the terms in this application is only for the convenience of understanding the implementations described below, and is not intended to limit the implementations of this application. These terms are to be understood according to their ordinary and usual meaning unless otherwise stated.
为清楚说明本申请的实施例,下面结合图1对本申请实施例提供的一种语音识别网络架构进行描述。In order to clearly illustrate the embodiment of the present application, a voice recognition network architecture provided by the embodiment of the present application will be described below with reference to FIG. 1 .
参见图1,图1为本申请实施例提供的一种语音交互原理的示意图。图1中,智能设备用于接收输入的信息以及输出对该信息的处理结果。语音识别服务设备为部署有语音识 别服务的电子设备,语义服务设备为部署有语义服务的电子设备,业务服务设备为部署有业务服务的电子设备。这里的电子设备可包括服务器、计算机等,这里的语音识别服务、语义服务(也可称为语义引擎)和业务服务为可部署在电子设备上的web服务,其中,语音识别服务用于将音频识别为文本,语义服务用于对文本进行语义解析,业务服务用于提供具体的服务如墨迹天气的天气查询服务、QQ音乐的音乐查询服务等。在一个实施例中,图1所示架构中可存在部署有不同业务服务的多个实体服务设备,也可以一个或多个实体服务设备中集合一项或多项功能服务。Referring to FIG. 1 , FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application. In FIG. 1 , the smart device is used to receive input information and output processing results of the information. Speech recognition service equipment is electronic equipment deployed with voice recognition services, semantic service equipment is electronic equipment deployed with semantic services, and business service equipment is electronic equipment deployed with business services. The electronic device here may include a server, a computer, etc., and the speech recognition service, semantic service (also called a semantic engine) and business service here are web services that can be deployed on the electronic device, wherein the speech recognition service is used for audio Recognized as text, the semantic service is used for semantic analysis of the text, and the business service is used to provide specific services such as the weather query service of Moji Weather, the music query service of QQ Music, etc. In an embodiment, there may be multiple entity service devices deployed with different business services in the architecture shown in FIG. 1 , or one or more functional services may be integrated in one or more entity service devices.
一些实施例中,下面对基于图1所示架构处理输入智能设备的信息的过程进行举例描述,以输入智能设备的信息为通过语音输入的查询语句为例,上述过程可包括如下三个过程:In some embodiments, the following is an example description of the process of processing the information input to the smart device based on the architecture shown in Figure 1. Taking the information input to the smart device as a query sentence input by voice as an example, the above process may include the following three processes :
[语音识别][Speech Recognition]
智能设备可在接收到通过语音输入的查询语句后,将该查询语句的音频上传至语音识别服务设备,以由语音识别服务设备通过语音识别服务将该音频识别为文本后返回至智能设备。在一个实施例中,将查询语句的音频上传至语音识别服务设备前,智能设备可对查询语句的音频进行去噪处理,这里的去噪处理可包括去除回声和环境噪声等步骤。After receiving the query sentence input by voice, the smart device can upload the audio of the query sentence to the voice recognition service device, so that the voice recognition service device can recognize the audio as text through the voice recognition service and return it to the smart device. In one embodiment, before uploading the audio of the query sentence to the speech recognition service device, the smart device may perform denoising processing on the audio of the query sentence, where the denoising processing may include steps such as removing echo and environmental noise.
[语义理解][semantic understanding]
智能设备将语音识别服务识别出的查询语句的文本上传至语义服务设备,以由语义服务设备通过语义服务对该文本进行语义解析,得到文本的业务领域、意图等。The smart device uploads the text of the query sentence recognized by the speech recognition service to the semantic service device, so that the semantic service device can perform semantic analysis on the text through the semantic service to obtain the business field and intention of the text.
[语义响应][semantic response]
语义服务设备根据对查询语句的文本的语义解析结果,向相应的业务服务设备下发查询指令以获取业务服务给出的查询结果。智能设备可从语义服务设备获取该查询结果并输出。作为一个实施例,语义服务设备还可将对查询语句的语义解析结果发送至智能设备,以由智能设备输出该语义解析结果中的反馈语句。According to the semantic analysis result of the text of the query statement, the semantic service device sends a query instruction to the corresponding business service device to obtain the query result given by the business service. The smart device can obtain and output the query result from the semantic service device. As an embodiment, the semantic service device can also send the semantic analysis result of the query sentence to the smart device, so that the smart device can output the feedback sentence in the semantic analysis result.
需要说明的是,图1所示架构只是一种示例,并非对本申请保护范围的限定。本申请实施例中,也可采用其他架构来实现类似功能,例如:三个过程全部或部分可以由智能终端来完成,在此不做赘述。It should be noted that the architecture shown in FIG. 1 is only an example, and does not limit the protection scope of the present application. In the embodiment of the present application, other architectures may also be used to implement similar functions. For example, all or part of the three processes may be completed by a smart terminal, which will not be described in detail here.
在一些实施例中,图1所示的智能设备可为显示设备,如智能电视,语音识别服务设备的功能可由显示设备上设置的声音采集器和控制器配合实现,语义服务设备和业务服务设备的功能可由显示设备的控制器实现,或者由显示设备的服务器来实现。In some embodiments, the smart device shown in Figure 1 can be a display device, such as a smart TV, and the function of the voice recognition service device can be realized by the cooperation of the sound collector and the controller set on the display device, and the semantic service device and business service device The functions of can be realized by the controller of the display device, or by the server of the display device.
随着语音交互技术的发展,越来越多的家用终端设备中都具备语音交互功能。利用语音交互功能,用户可语音控制这些终端设备执行相应的操作,例如启动、停止运转等。With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. Using the voice interaction function, the user can voice control these terminal devices to perform corresponding operations, such as starting and stopping.
目前用户语音控制终端设备的过程为,用户输入语音信号,终端设备采集到语音信号后,将语音信号转化为相应的指令,以使得终端根据指令进行相应的操作。At present, the process of the user's voice control of the terminal device is that the user inputs a voice signal, and after the terminal device collects the voice signal, it converts the voice signal into a corresponding instruction, so that the terminal performs corresponding operations according to the instruction.
然而,目前大部分的终端设备的语音交互功能,都受距离的限制。用户无法在室内任意地点操控其想操控的设备。例如,无法在厨房语音控制卧室的智能电视关机或开机,无法在客厅通过语音控制调节卧室空调温度。用户要想操控终端设备,需要移动至有效距离或者提高音量,导致用户使用体验较差。However, at present, the voice interaction functions of most terminal devices are limited by distance. Users cannot control the devices they want to control anywhere in the room. For example, the smart TV in the bedroom cannot be turned off or turned on by voice in the kitchen, and the temperature of the air conditioner in the bedroom cannot be adjusted by voice control in the living room. If the user wants to control the terminal device, he needs to move to an effective distance or increase the volume, resulting in poor user experience.
为了解决上述问题,本申请提供一种终端设备语音控制系统,如图2为本申请实施例提供的一种终端设备的语音控制系统框架示意图。该系统包括至少两台终端设备200和服务器400。终端设备200用于采集用户输入的语音信号。终端设备200与服务器400通信 连接。服务器400用于接收终端设备200发送的信号或者请求,并向终端设备200反馈相应的指令。In order to solve the above problems, the present application provides a voice control system for a terminal device, as shown in FIG. 2 , which is a schematic framework diagram of a voice control system for a terminal device provided in an embodiment of the present application. The system includes at least two terminal devices 200 and a server 400 . The terminal device 200 is configured to collect voice signals input by the user. The terminal device 200 communicates with the server 400. The server 400 is configured to receive a signal or a request sent by the terminal device 200 , and feed back corresponding instructions to the terminal device 200 .
在一些实施例中,终端设备200-1的声音采集器采集到用户输入的语音信号。之后终端设备200-1将采集的语音信号发送至服务器400。服务器400根据该语音信号生成语音指令。需要说明的是,服务器400利用语义系统将语音信号转化为语音指令,这里的具体转化过程本申请不作限制。In some embodiments, the sound collector of the terminal device 200-1 collects the voice signal input by the user. Then the terminal device 200 - 1 sends the collected voice signal to the server 400 . The server 400 generates voice instructions according to the voice signal. It should be noted that the server 400 uses the semantic system to convert the voice signal into a voice instruction, and the specific conversion process here is not limited in this application.
进一步服务器400将转化得到的语音指令反馈至终端设备200-1。终端设备200-1接收到语音指令后,终端设备200-1的本机执行能力模块判断本机是否有执行该语音指令对应操作的能力。如果判断结果为有能力执行该语音指令对应操作,则将该语音指令发送至控制器。控制器响应于该语音指令,控制终端设备200-1执行该语音指令对应的操作。Further, the server 400 feeds back the converted voice instruction to the terminal device 200-1. After the terminal device 200-1 receives the voice command, the local execution capability module of the terminal device 200-1 judges whether the terminal device has the capability to execute the operation corresponding to the voice command. If the judging result is that the operation corresponding to the voice command is capable of being performed, the voice command is sent to the controller. In response to the voice command, the controller controls the terminal device 200-1 to perform an operation corresponding to the voice command.
如果判断结果为本机没有能力执行该语音指令对应操作,则根据该语音指令生成指令分发请求,该指令分发请求携带有该语音指令。之后将指令分发请求发送至服务器400。服务器400接收到指令分发请求后,查找可执行该语音指令的终端设备200。例如,查找到终端设备200-2可执行该语音指令对应的操作,则将语音指令发送至终端设备200-2,以使终端设备200-2的控制器,响应于该语音指令,控制终端设备200-2执行该语音指令对应的操作。If the judgment result is that the machine is not capable of performing the operation corresponding to the voice instruction, an instruction distribution request is generated according to the voice instruction, and the instruction distribution request carries the voice instruction. Then send the instruction distribution request to the server 400 . After receiving the command distribution request, the server 400 searches for a terminal device 200 that can execute the voice command. For example, if it is found that the terminal device 200-2 can perform the operation corresponding to the voice command, the voice command is sent to the terminal device 200-2, so that the controller of the terminal device 200-2 controls the terminal device in response to the voice command 200-2 Execute the operation corresponding to the voice instruction.
示例性的,在一种场景中,用户在智能音箱附近输入语音信号“打开电视”,电视在卧室,但是智能音箱在厨房。智能音箱接收到语音信号“打开电视”后,将语音信号发送至服务器400。服务器将该语音信号转化为语音指令,并将语音指令反馈至智能音箱。Exemplarily, in a scenario, the user inputs a voice signal "turn on the TV" near the smart speaker, the TV is in the bedroom, but the smart speaker is in the kitchen. After receiving the voice signal “turn on the TV”, the smart speaker sends the voice signal to the server 400 . The server converts the voice signal into a voice command, and feeds the voice command back to the smart speaker.
由于智能音箱不能执行“打开电视”的操作,因而智能音箱向服务器400发送携带有语音指令“打开电视”的指令分发请求。服务器400接收到指令分发请求后,查找可执行语音指令“打开电视”的终端设备。查找到可执行语音指令“打开电视”的终端设备为电视,则将语音指令“打开电视”发送至卧室的电视。卧室的电视接收到语音指令“打开电视”后,响应于该语音指令,执行打开操作。从而实现不在卧室也能语音控制卧室电视打开的目的。Since the smart speaker cannot perform the operation of “turning on the TV”, the smart speaker sends an instruction distribution request carrying the voice command “turn on the TV” to the server 400 . After receiving the command distribution request, the server 400 searches for a terminal device capable of executing the voice command "turn on the TV". If it is found that the terminal device capable of executing the voice command "turn on the TV" is a TV, then the voice command "turn on the TV" is sent to the TV in the bedroom. After the TV in the bedroom receives the voice command "Turn on the TV", it responds to the voice command and performs a power-on operation. In this way, the purpose of turning on the TV in the bedroom can be controlled by voice without being in the bedroom.
在一些实施例中,每一台终端设备都配置有本机执行能力过滤模块,本机执行能力过滤模块配置有本机能力属性参数。终端设备在确定本机是否有能力执行语音指令对应操作的具体步骤为:In some embodiments, each terminal device is configured with a local execution capability filtering module, and the local execution capability filtering module is configured with local capability attribute parameters. The specific steps for the terminal device to determine whether the machine is capable of performing the corresponding operation of the voice command are as follows:
从语音指令中解析待处理能力属性参数,其中待处理能力属性参数则为对应的操作。将本机能力属性参数与待处理能力属性参数进行匹配,如果本机能力属性参数与待处理能力属性参数匹配,则表示该终端设备能够执行该语音指令对应的操作。如果本机能力属性参数与待处理能力属性参数不匹配,则表示该终端设备不能够执行该语音指令对应的操作。Parse the pending capability attribute parameter from the voice command, where the pending capability attribute parameter is the corresponding operation. The native capability attribute parameter is matched with the pending capability attribute parameter. If the native capability attribute parameter matches the pending capability attribute parameter, it means that the terminal device can execute the operation corresponding to the voice command. If the local capability attribute parameter does not match the pending capability attribute parameter, it means that the terminal device cannot perform the operation corresponding to the voice command.
终端设备200-1为显示设备、终端设备200-2为空调设备、终端200-3为洗衣机设备、终端200-4为冰箱设备。则终端设备200-1的本机能力属性参数为播放音视频,终端200-2的本机能力属性参数为制冷和制热,终端200-3的本机能力属性参数为洗衣,终端200-4的本机能力属性参数为制冷。The terminal device 200-1 is a display device, the terminal device 200-2 is an air conditioner device, the terminal 200-3 is a washing machine device, and the terminal 200-4 is a refrigerator device. Then the local capability attribute parameter of the terminal device 200-1 is playing audio and video, the local capability attribute parameter of the terminal 200-2 is cooling and heating, the local capability attribute parameter of the terminal 200-3 is laundry, and the terminal 200-4 The Native Capability property parameter of is Cooling.
如果用户在终端设备200-2的信号可接收范围内,输入语音信号“制热”,则终端设备200-2采集到语音信号并接收服务器400发送的语音指令“制热”。进一步从语音指令中能够解析得到待处理能力属性参数为“制热”。再通过终端设备200-2的本机能力过滤模块,终端设备200-2的本机能力属性参数为“制热”。终端设备200-2的本机能力属性 参数能够与待处理能力属性参数匹配。则表示终端设备200-2能够执行语音指令“制热”对应的操作。If the user inputs the voice signal "heating" within the signal receiving range of the terminal device 200-2, the terminal device 200-2 collects the voice signal and receives the voice command "heating" sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capacity to be processed is "heating". Then, through the local capability filtering module of the terminal device 200-2, the local capability attribute parameter of the terminal device 200-2 is "heating". The native capability attribute parameter of the terminal device 200-2 can match the capability attribute parameter to be processed. It means that the terminal device 200-2 can perform the operation corresponding to the voice command "heating".
需要说明的是,如果待处理能力属性参数,在文本上,不能够完全与本机能力属性参数匹配。本申请的本机能力过滤模块还能够根据对解析出的待处理能力属性参数进行相应的转化。例如,如果用户在终端设备200-2信号可接收的范围内,输入语音信号“加热”,经过本机能力属性模块解析后得到文本“加热”。此时待处理能力属性参数,不能够完全与终端设备200-2的本机能力属性参数“制热”匹配。本机能力过滤模块可以对待处理能力属性参数进行分析,得到“加热”与“制热”的含义相同。因此,将待处理能力属性参数视为与本机能力属性参数匹配。即得到终端设备200-2能够实现语音信号“加热”对应的操作。It should be noted that if the attribute parameter of the capability to be processed cannot completely match the attribute parameter of the native capability in terms of text. The native capability filtering module of the present application can also perform corresponding conversion according to the parsed capability attribute parameters to be processed. For example, if the user inputs the voice signal "heating" within the signal receiving range of the terminal device 200-2, the text "heating" will be obtained after being parsed by the local capability attribute module. At this time, the capability attribute parameter to be processed cannot completely match the local capability attribute parameter “heating” of the terminal device 200-2. The local capacity filtering module can analyze the attribute parameters of the capacity to be processed, and obtain that "heating" and "heating" have the same meaning. Therefore, the pending capability attribute parameter is considered to match the native capability attribute parameter. That is, it is obtained that the terminal device 200-2 can realize the operation corresponding to the voice signal "heating".
如果用户在终端设备200-2的信号可接收范围内,输入语音信号“播放音乐”,则终端设备200-2采集到语音信号并接收服务器400发送的语音指令“播放音乐”。进一步从语音指令中能够解析得到待处理能力属性参数为“播放音乐”。再通过终端200-2的本机能力过滤模块,终端设备200-2的本机能力属性参数为“制冷”和“制冷”。则终端设备200-2的本机能力属性参数与待处理能力属性参数不匹配。则表示终端设备200-2不能执行语音指令“播放音乐”对应的操作。If the user inputs the voice signal "play music" within the signal receiving range of the terminal device 200-2, the terminal device 200-2 collects the voice signal and receives the voice command "play music" sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capability to be processed is "play music". Then, through the local capability filtering module of the terminal 200-2, the local capability attribute parameters of the terminal device 200-2 are "cooling" and "cooling". Then the local capability attribute parameter of the terminal device 200-2 does not match the capability attribute parameter to be processed. It means that the terminal device 200-2 cannot perform the operation corresponding to the voice instruction "play music".
在一些实施例中,如果语音指令中只包括设备名称,则服务器根据指令分发请求,查找与设备名称对应的第二终端设备,以及将语音指令发送至第二终端设备。以使第二终端设备响应于该语音指令执行对应操作。In some embodiments, if the voice command only includes the device name, the server searches for the second terminal device corresponding to the device name according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.
图3为本申请实施例提供的一种终端设备的语音控制系统场景示意图,如图3所示,用户在设备1信号可接收范围内容输入“打开设备3”、“设备3状态”、“关闭设备3”等语音指令。这些语音指令都只包括设备名称“设备3”。设备1的设备名称与语音指令中包括的设备名称不匹配,则设备1无法执行该语音指令对应的操作。服务器根据设备名称查找名称匹配的终端设备。最终查找到设备3的设备名称与语音指令中包括的设备名称相匹配,服务器将语音指令发送至设备3。设备3接收到语音指令“打开设备3”之后,响应于该语音指令,执行启动的操作。或者,设备3接收到语音指令“关闭设备3”之后,响应于该语音指令,执行关闭的操作。Figure 3 is a schematic diagram of a voice control system for a terminal device provided in the embodiment of the present application. Device 3” and other voice commands. These voice commands all only include the device name "device 3". If the device name of device 1 does not match the device name included in the voice command, device 1 cannot perform the operation corresponding to the voice command. The server searches for a terminal device with a matching name based on the device name. Finally, it is found that the device name of device 3 matches the device name included in the voice command, and the server sends the voice command to device 3 . After the device 3 receives the voice command "turn on the device 3", it responds to the voice command and executes the starting operation. Alternatively, after the device 3 receives the voice command "shut down the device 3", it responds to the voice command and performs a shutdown operation.
在一些实施例中,服务器查找到第二终端设备,将语音指令发送至第二终端设备之后,第二终端设备也可以利用本机能力过滤模块,再次确认本机是否可执行语音指令对应操作。如果再次确认本机可执行语音指令对应操作,则响应于该语音指令,执行对应操作。如果经过再次确认本机并不可执行语音指令对应操作,则第二终端设备可向服务器反馈错误信号,使得服务器重新查找可执行语音指令对应操作的终端设备。In some embodiments, after the server finds the second terminal device and sends the voice command to the second terminal device, the second terminal device can also use the local capability filtering module to reconfirm whether the machine can perform the operation corresponding to the voice command. If it is confirmed again that the machine can perform the operation corresponding to the voice command, the corresponding operation is performed in response to the voice command. If it is reconfirmed that the machine cannot perform the operation corresponding to the voice command, the second terminal device can feed back an error signal to the server, so that the server can search for a terminal device that can perform the corresponding operation of the voice command again.
在一些实施例中,如果语音指令中只包括设备能力,则服务器根据指令分发请求,查找具备该设备能力的第二终端设备,以及将语音指令发送至第二终端设备。以使第二终端设备响应于该语音指令执行对应操作。In some embodiments, if the voice command only includes the device capability, the server searches for a second terminal device having the device capability according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.
图4为本申请实施例提供的又一种终端设备的语音控制系统场景示意图,如图4所示,,用户在音箱设备信号可接收范围内输入“降低温度”、“温度调到20度”、“升高温度”、“提升风速”等语音指令。这些语音指令只包括设备能力。音箱设备的本机能力属性参数不符合上述语音指令包括的设备能力参数,则音响设备不能执行上述语音指令对应的操作。音箱设备向服务器发送指令分发请求,服务器根据指令分发请求查找符合该设备能力参数 的终端设备。图4所示的设备中只有空调的本机能力属性参数符合该设备能力参数。则服务器将该语音指令发送至空调,空调接收到该语音指令后,响应于该语音指令,执行对应操作。Fig. 4 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application. As shown in Fig. 4, the user inputs "reduce the temperature" or "adjust the temperature to 20 degrees" within the acceptable range of the speaker device signal , "increase temperature", "increase wind speed" and other voice commands. These voice commands only include device capabilities. If the local capability attribute parameter of the speaker device does not conform to the device capability parameter included in the above voice command, the audio device cannot perform the operation corresponding to the above voice command. The speaker device sends an instruction distribution request to the server, and the server searches for a terminal device that meets the device's capability parameters according to the instruction distribution request. Among the devices shown in FIG. 4 , only the local capability attribute parameter of the air conditioner conforms to the device capability parameter. Then the server sends the voice command to the air conditioner, and the air conditioner performs corresponding operations in response to the voice command after receiving the voice command.
需要说明的是,如果语音指令中包括的设备能力参数有多个终端设备符合,并且语音指令只包括设备能力参数,则服务器将语音指令发送至符合条件的多个终端设备。多个终端设备响应于该语音指令,执行对应操作。It should be noted that if the device capability parameters included in the voice command are met by multiple terminal devices, and the voice command only includes the device capability parameters, the server will send the voice command to the multiple terminal devices that meet the conditions. Multiple terminal devices perform corresponding operations in response to the voice instruction.
用户在空调信号可接收范围内输入“降低温度”,空调的本机能力过滤模块首先根据本机能力属性参数,判断本机可执行该语音指令对应的操作。进一步空调还向服务器发送指令分发请求。服务器根据指定分发请求携带的设备能力参数,查找除了空调之外,同样符合该设备能力参数的终端设备。即查找可执行语音指令“降低温度”对应操作的终端设备。最后查找到冰箱也可执行语音指令“降低温度”对应操作。服务器将语音指令“降低温度”发送至冰箱,以使冰箱响应于该语音指令,执行对应操作。通过本实施例,可以实现用户输入一次语音指令,同时控制多台终端设备的效果。The user inputs "lower temperature" within the acceptable range of the air conditioner signal, and the local capability filtering module of the air conditioner first judges that the local unit can perform the operation corresponding to the voice command according to the local capability attribute parameters. Further, the air conditioner also sends an instruction distribution request to the server. According to the device capability parameter carried in the specified distribution request, the server searches for terminal devices other than the air conditioner that also meet the device capability parameter. That is, find the terminal device that can perform the operation corresponding to the voice command "lower temperature". Finally, if you find the refrigerator, you can also perform the corresponding operation of the voice command "lower temperature". The server sends the voice command "lower temperature" to the refrigerator, so that the refrigerator performs corresponding operations in response to the voice command. Through this embodiment, the user can input a voice command once and control multiple terminal devices at the same time.
还需要说明的是,如果用户输入一次语音指令,可以同时控制多台终端设备,但是用户并不需要控制多台终端设备。用户可以输入同时包括设备名称和设备能力参数的语音指令。It should also be noted that if the user inputs a voice command once, multiple terminal devices can be controlled at the same time, but the user does not need to control multiple terminal devices. The user can enter voice commands that include both device name and device capability parameters.
例如,用户在空调信号可接收范围内输入“降低空调的温度”,该语音指令同时包括设备名称“空调”和设备能力参数“降低温度”。则空调通过本机能力过滤模块判断本机可执行该语音指令对应的操作,同时空调的设备名称符合语音指令携带的设备名称。因此,空调不再向服务器发送指令分发请求,服务器也不再查找其他终端设备。For example, if the user inputs "decrease the temperature of the air conditioner" within the receivable range of the air conditioner signal, the voice command includes the device name "air conditioner" and the device capability parameter "reduce the temperature". Then the air conditioner judges through the local capability filtering module that the local machine can perform the operation corresponding to the voice command, and at the same time, the device name of the air conditioner matches the device name carried in the voice command. Therefore, the air conditioner no longer sends an instruction distribution request to the server, and the server no longer searches for other terminal devices.
在一些实施例中,如果语音指令中只携带有自定义规则,则服务器根据自定义规则查找匹配的第二终端。其中,在自定义规则中语音指令与终端设备具有对应关系。In some embodiments, if the voice instruction only carries custom rules, the server searches for a matching second terminal according to the custom rules. Wherein, in the self-defined rule, the voice command has a corresponding relationship with the terminal device.
例如,图5为本申请实施例提供的又一种终端设备的语音控制系统场景示意图,如图5所示,,自定义规则包括:由设备2优先播放音乐,由设备3优先播放影视,由设备4优先播放有声小说等,即播放音乐指令与设备2对应,播放影视与设备3对应,播放有声小说与设备4对应。当用户在设备1信号接收范围内输入语音指令“播放音乐”,首先设备1的本机能力过滤模块判断本机不能执行该语音指令对应的操作,之后设备1向服务器发送指令分发指令。服务器根据自定义规则,查找到与语音指令“播放音乐”对应的终端设备为设备2,则将该语音指令发送至设备2。For example, FIG. 5 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application. As shown in FIG. Device 4 gives priority to playing audio novels, etc., that is, the instruction to play music corresponds to Device 2, the instruction to play movies and TV corresponds to Device 3, and the instruction to play audio novels corresponds to Device 4. When the user inputs the voice command "play music" within the signal receiving range of device 1, the local capability filtering module of device 1 first judges that the machine cannot perform the operation corresponding to the voice command, and then device 1 sends a command distribution command to the server. According to the custom rules, the server finds that the terminal device corresponding to the voice command "play music" is device 2, and then sends the voice command to device 2.
当用户在设备1信号接收范围内输入语音指令“播放影视”,首先设备1的本机能力过滤模块判断本机不能执行该语音指令对应的操作,之后设备1向服务器发送指令分发指令。服务器根据自定义规则,查找到与语音指令“播放影视”对应的终端设备为设备3,则将该语音指令发送至设备3。When the user inputs the voice command "play video" within the signal receiving range of device 1, first the local capability filtering module of device 1 judges that the machine cannot perform the operation corresponding to the voice command, and then device 1 sends the command distribution command to the server. According to the custom rules, the server finds that the terminal device corresponding to the voice command "play video" is device 3, and then sends the voice command to device 3.
在一些实施例中,服务器包括融合能力规则数据库和指令分发模块。融合能力规则数据库中存储有所有设备的本机能力属性参数。运营人员可在融合能力规则数据库中更新设备的本机能力属性参数。例如,某台终端设备经过更新,具备了某种新的能力,则需要增加该设备的本机能力属性参数。在融合能力规则数据库中所有设备按照设备名称,设备ID存储。In some embodiments, the server includes a fusion capability rules database and an instruction distribution module. The native capability attribute parameters of all devices are stored in the fusion capability rule database. Operators can update the device's local capability attribute parameters in the fusion capability rule database. For example, if a terminal device has been updated to have a new capability, it is necessary to increase the local capability attribute parameter of the device. In the fusion capability rule database, all devices are stored according to device name and device ID.
指令分发模块接收终端设备发送的指令分发请求,并且可从指令分发请求携带的语音指令中解析待处理能力属性参数。之后指令分发模块从融合能力规则数据库中查找与待处 理能力属性参数匹配的本机能力属性参数,从而查找到能够执行语音指令对应操作的终端设备。另外,终端设备的本机能力过滤模块也可以从融合能力规则数据库中查找本机能力属性参数。The instruction distribution module receives the instruction distribution request sent by the terminal device, and can parse the capability attribute parameter to be processed from the voice instruction carried in the instruction distribution request. Afterwards, the command distribution module searches the fusion capability rule database for the local capability attribute parameters that match the capability attribute parameters to be processed, so as to find the terminal device that can perform the corresponding operation of the voice command. In addition, the native capability filtering module of the terminal device may also search for native capability attribute parameters from the fusion capability rule database.
在一些实施例中,当用户输入模糊的语音指令,可能会存在多台终端设备可执行语音指令对应操作。模糊的语音指令可以是模糊的设备控制指令,模糊的媒资播放指令等。In some embodiments, when the user inputs vague voice commands, there may be multiple terminal devices that can perform operations corresponding to the voice commands. The vague voice command may be a vague device control command, a vague media asset playback command, and the like.
例如,家庭场景中,可能存在多台空调。当用户输入语音指令“打开客厅空调”时,根据设备名称规则和具体空间规则,可以直接将语音指令发送至客厅空调,以使客厅空调执行启动的操作。当用户输入语音指令“打开空调”时,根据设备名称规则,客厅空调和卧室空调均可执行该语音指令对应操作。因此可以设置其他属性,以锁定更具体的设备。例如制定时间规则:11:00~14:00打开客厅空调,15:00~17:00打开卧室空调。当用户在12:00的时刻输入语音指令“打开空调”,根据时间规则,将该语音指令发送至客厅空调,以使客厅空调执行启动操作。For example, in a family scenario, there may be multiple air conditioners. When the user enters the voice command "turn on the air conditioner in the living room", according to the device name rules and specific space rules, the voice command can be directly sent to the air conditioner in the living room, so that the air conditioner in the living room can be activated. When the user enters the voice command "turn on the air conditioner", according to the device name rules, both the air conditioner in the living room and the air conditioner in the bedroom can perform the corresponding operation of the voice command. Therefore other properties can be set to target more specific devices. For example, formulate time rules: turn on the air conditioner in the living room from 11:00 to 14:00, and turn on the air conditioner in the bedroom from 15:00 to 17:00. When the user inputs the voice command "turn on the air conditioner" at 12:00, the voice command is sent to the air conditioner in the living room according to the time rule, so that the air conditioner in the living room performs the start operation.
如果家庭场景中,存在多台音箱。当用户输入的语音指令与儿童故事、儿歌相关,则可制定规则,与儿童故事、儿歌相关的语音指令发送至儿童房的音箱。If there are multiple speakers in a home scene. When the voice command input by the user is related to children's stories and nursery rhymes, rules can be formulated, and the voice commands related to children's stories and nursery rhymes are sent to the speakers in the children's room.
在一些实施例中,还可以根据用户需要播放的影视节目的播放时间,控制不同空间的显示设备的开关。例如,用户输入语音指令“播放新闻联播”,用户可能在制定规则中指定新闻联播在客厅中观看,则服务器将该语音指令发送至客厅的显示设备。以使客厅的显示设备执行播放新闻联播的操作。In some embodiments, it is also possible to control the switches of the display devices in different spaces according to the playing time of the video programs that the user needs to play. For example, if the user inputs a voice command "play news feed", the user may specify in the rules to watch the news feed in the living room, and the server sends the voice command to the display device in the living room. In order to make the display device in the living room execute the operation of playing the news broadcast.
在一些实施例中,用户输入的语音指令可能包括多个匹配项,例如,可以包括设备名称、设备响应时间段、设备存在的空间、设备能力参数等。不同的终端设备可能同时满足语音指令中包括的匹配项。例如,语音指令中同时包括设备名称、设备响应时间段、设备存在的空间、设备能力参数四个匹配项。设备1满足匹配项设备名称和时间段,设备2满足匹配项时间段和设备能力参数。此时可为每一个匹配项设置对应的权重值。例如,设备名称的权重值为10,时间段的权重值为5,空间的权重值为3,设备能力参数的权重值为8。根据公式(1)In some embodiments, the voice command input by the user may include multiple matching items, for example, may include device name, device response time period, space where the device exists, device capability parameters, and the like. Different terminal devices may satisfy the matching items included in the voice command at the same time. For example, the voice command includes four matching items: the device name, the device response time period, the space where the device exists, and the device capability parameter. Device 1 meets the matching item device name and time zone, and device 2 meets the matching item time range and device capability parameters. At this time, the corresponding weight value can be set for each matching item. For example, the weight value of the device name is 10, the weight value of the time period is 5, the weight value of the space is 3, and the weight value of the device capability parameter is 8. According to formula (1)
Figure PCTCN2022100547-appb-000001
Figure PCTCN2022100547-appb-000001
其中,a i为各个终端设备符合的匹配项的权重值,分别得到设备1和设备2最终权重值。设备1的权重属性总值为15,设备2的权重属性总值为11。则设备1的权重属性总值最大,设备1为最优匹配的终端设备。最终服务器将语音指令发送至设备1,以使设备1响应于该语音指令执行对应操作。 Wherein, a i is the weight value of the matching items that each terminal device meets, and the final weight values of device 1 and device 2 are respectively obtained. The total value of the weight attribute of device 1 is 15, and the total value of the weight attribute of device 2 is 11. Then the total value of the weight attribute of device 1 is the largest, and device 1 is the optimal matching terminal device. Finally, the server sends the voice command to the device 1, so that the device 1 performs corresponding operations in response to the voice command.
需要说明的是,本申请的服务器可以区分为语义服务器和指令分发服务器。语义服务器用于从用户输入的语音信号识别出语音指令。指令分发服务器保存有融合能力规则数据库,用于根据指令分发请求,查找可执行语音指令对应操作的终端设备。语义服务器可以是网络服务器,而指令分发服务器是本地服务器。由于本地服务器具有响应快的优势,因而能够提升整个语音控制过程的响应速度。It should be noted that the servers in this application can be divided into semantic servers and instruction distribution servers. The semantic server is used to recognize voice commands from voice signals input by users. The instruction distribution server stores a fusion capability rule database, which is used to search for terminal devices that can perform operations corresponding to voice instructions according to the instruction distribution request. The semantic server can be a web server, while the instruction distribution server is a local server. Since the local server has the advantage of fast response, the response speed of the entire voice control process can be improved.
基于上述实施例,本申请还提供一种终端设备的语音控制方法,图6为本申请实施例提供的一种终端设备的语音控制方法信令图,如图6所示的信令图,所述方法包括以下步骤:Based on the above embodiments, the present application also provides a voice control method for a terminal device. FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application. The signaling diagram is shown in FIG. 6 , so Said method comprises the following steps:
S101:第一终端设备的麦克风接收用户输入的语音信号。S101: The microphone of the first terminal device receives a voice signal input by a user.
S102:第一终端设备将语音信号发送至服务器。S102: The first terminal device sends the voice signal to the server.
S103:服务器根据语音信号生成语音指令,以及将语音指令反馈至第一终端设备。S103: The server generates a voice command according to the voice signal, and feeds back the voice command to the first terminal device.
S104:第一终端设备判断本机是否可执行语音指令对应的操作。S104: The first terminal device judges whether the machine can perform the operation corresponding to the voice command.
S105:如果本机可执行语音指令对应的操作,则响应于该语音指令执行对应操作。S105: If the machine can perform the operation corresponding to the voice command, perform the corresponding operation in response to the voice command.
S106:如果本机不可执行该语音指令对应的操作,则向服务器发送指令分发请求。指令分发请求携带有该语音指令。S106: If the machine cannot perform the operation corresponding to the voice instruction, send an instruction distribution request to the server. The instruction distribution request carries the voice instruction.
S107:服务器接收到指令分发请求之后,根据指令分发请求,查找可执行该语音指令对应操作的第二终端设备,以及将语音指令发送至第二终端设备。S107: After receiving the instruction distribution request, the server searches for a second terminal device that can perform the operation corresponding to the voice instruction according to the instruction distribution request, and sends the voice instruction to the second terminal device.
S108:第二终端设备响应于该语音指令执行对应的操作。S108: The second terminal device performs a corresponding operation in response to the voice instruction.
在一些实施例中,第一终端设备判断本机是否可执行该语音指令对应的操作的具体过程为:In some embodiments, the specific process for the first terminal device to determine whether the machine can perform the operation corresponding to the voice command is as follows:
从语音指令中解析待处理能力属性参数,第一终端设备的本机能力过滤模块可以从融合能力规则数据库中获取本机能力属性参数。之后将本机能力属性参数和待处理能力属性参数进行匹配,如果能够匹配,则第一终端设备可执行该语音指令对应的操作。如果不匹配,则第一终端设备不可执行该语音指令对应的操作。By parsing the to-be-processed capability attribute parameters from the voice command, the local capability filtering module of the first terminal device may acquire the native capability attribute parameters from the fusion capability rule database. Afterwards, the local capability attribute parameter and the pending capability attribute parameter are matched, and if they match, the first terminal device may execute the operation corresponding to the voice command. If they do not match, the first terminal device cannot perform the operation corresponding to the voice command.
在一些实施例中,如果语音指令中只携带有设备名称,则服务器在查找第二终端设备时,为查找与设备名称对应的第二终端设备。例如,语音指令为“打开音箱”,则服务器根据设备名称“音箱”查找音箱设备。In some embodiments, if the voice instruction only carries the device name, the server searches for the second terminal device corresponding to the device name when searching for the second terminal device. For example, if the voice instruction is "turn on the speaker", the server searches for the speaker device according to the device name "speaker".
在一些实施例中,如果语音指令中只携带有设备能力参数时,则服务器在查找第二终端设备时,为查找具备该设备能力参数的第二终端设备。例如,语音指令为“降低温度”,则识别出待处理设备能力参数为“降低温度”。服务器的指令分发模块可从融合能力规则数据库中查找,与待处理设备能力参数匹配的本机能力属性参数。即查找到可执行该语音指令对应操作的终端设备。In some embodiments, if the voice instruction only carries the device capability parameter, the server searches for the second terminal device with the device capability parameter when searching for the second terminal device. For example, if the voice command is "decrease temperature", it is recognized that the capability parameter of the device to be processed is "decrease temperature". The instruction distribution module of the server can search the fusion capability rule database for the native capability attribute parameter matching the capability parameter of the device to be processed. That is, the terminal device that can perform the operation corresponding to the voice command is found.
在一些实施例中,如果语音指令为自定义规则对应的指令。则服务器在查找第二终端设备时,为查找与自定义规则具有对应关系的第二终端设备。例如,如图5所示的场景中,自定义规则包括:由设备2优先播放音乐,由设备3优先播放影视,由设备4优先播放有声小说等。用户输入的语音指令为“播放音乐”,则设备2与自定义规则对应,则将设备2确定为第二终端设备。In some embodiments, if the voice instruction is an instruction corresponding to a custom rule. Then, when searching for the second terminal device, the server searches for the second terminal device corresponding to the custom rule. For example, in the scenario shown in FIG. 5 , the custom rules include: device 2 gives priority to playing music, device 3 gives priority to playing videos, and device 4 gives priority to playing audio novels, etc. If the voice command input by the user is "play music", then the device 2 corresponds to the custom rule, and the device 2 is determined as the second terminal device.
在一些实施例中,如果语音指令中包括至少两条匹配项,每一条匹配项设置有权重属性值。如果服务器在根据匹配项查找时,存在至少两个终端设备满足至少一条匹配项时,则计算这些终端设备满足的所有匹配项的权重属性总值,即权重值的总和。权重属性总值最大的确定为第二终端设备。In some embodiments, if the voice instruction includes at least two matching items, each matching item is set with a weight attribute value. If the server finds at least two terminal devices satisfying at least one matching item when searching according to matching items, then calculate the total weight attribute value of all matching items satisfied by these terminal devices, that is, the sum of weight values. The one with the largest weight attribute total value is determined to be the second terminal device.
本申请实施例中所述语音控制系统是一种以特定区域网络为基础,基于统一控制服务建立的网络系统,所述语音控制系统可以包括多个相互建立通信连接关系的终端设备200。多个终端设备200可以通过接入同一个局域网络,实现设备之间的通信连接关系。多个终端设备200还可以直接通过统一通信协议组成点对点网络,实现通信连接。例如,多个终端设备200可以通过连接同一个无线局域网,使多个终端设备200之间可以相互通信。还例如,一个终端设备200还可以通过蓝牙、红外、蜂窝网络、电力载波通信等方式,与其他多个终端设备200建立通信连接。The voice control system in the embodiment of the present application is a network system established based on a specific area network and based on a unified control service. The voice control system may include a plurality of terminal devices 200 that establish communication connections with each other. Multiple terminal devices 200 can realize the communication connection relationship between the devices by accessing the same local area network. A plurality of terminal devices 200 can also directly form a point-to-point network through a unified communication protocol to realize a communication connection. For example, multiple terminal devices 200 may communicate with each other by connecting to the same wireless local area network. For another example, one terminal device 200 may also establish communication connections with other multiple terminal devices 200 through Bluetooth, infrared, cellular network, power carrier communication and other means.
其中,所述终端设备200是指具有通信功能,能够接收、发送、执行控制指令并实现 特定功能的设备。所述终端设备200包括但不限于智能显示设备、智能终端、智能家电、智能网关、智能照明设备、智能音频设备、游戏设备等。构成语音控制系统的多个终端设备200可以为相同类型的设备,也可以为不同类型的设备。例如,如图7所示,在同一个语音控制系统中,可以包括智能电视、智能音箱、智能电冰箱、多个智能灯具等。这些终端设备200可以分布在不同的位置,以满足对应位置上的使用需求。Wherein, the terminal device 200 refers to a device having a communication function, capable of receiving, sending, and executing control instructions and realizing specific functions. The terminal device 200 includes, but is not limited to, a smart display device, a smart terminal, a smart home appliance, a smart gateway, a smart lighting device, a smart audio device, a game device, and the like. The multiple terminal devices 200 constituting the voice control system may be of the same type or of different types. For example, as shown in FIG. 7 , in the same voice control system, smart TVs, smart speakers, smart refrigerators, multiple smart lamps, etc. may be included. These terminal devices 200 may be distributed in different locations, so as to meet usage requirements at corresponding locations.
需要说明的是,本申请所述的语音控制系统,并不对本申请所要保护方案的应用范围造成限定。即在实际应用中,本申请提供的服务器、终端设备以及语音控制方法并不局限于应用在智能家居领域中,对于其他支持智能语音控制的系统,例如,智能办公系统、智能服务系统、智能管理系统、工业生产系统等也同样适用。It should be noted that the voice control system described in this application does not limit the scope of application of the solution to be protected in this application. That is, in practical applications, the server, terminal equipment and voice control method provided by this application are not limited to the application in the field of smart home, for other systems that support intelligent voice control, such as smart office systems, smart service systems, smart management The same applies to systems, industrial production systems, etc.
根据终端设备200的实际功能,终端设备200具有特定的硬件配置。如图8所示,以显示设备为例,具有显示功能的终端设备200可以包括调谐解调器210、通信器220、检测器230、外部装置接口240、控制器250、显示器260、音频输出接口270、存储器、供电电源、用户接口中的至少一种。According to the actual functions of the terminal device 200, the terminal device 200 has a specific hardware configuration. As shown in FIG. 8, taking a display device as an example, a terminal device 200 with a display function may include a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, and an audio output interface. 270. At least one of a memory, a power supply, and a user interface.
在一些实施例中,控制器250包括中央处理器,视频处理器,音频处理器,图形处理器,RAM,ROM,用于输入/输出的第一接口至第n接口。In some embodiments, the controller 250 includes a CPU, a video processor, an audio processor, a graphics processor, a RAM, a ROM, and a first interface to an nth interface for input/output.
在一些实施例中,显示器260包括用于呈现画面的显示屏组件,以及驱动图像显示的驱动组件,用于接收源自控制器输出的图像信号,进行显示视频内容、图像内容以及菜单操控界面的组件以及用户操控UI界面等。In some embodiments, the display 260 includes a display screen component for presenting images, and a drive component for driving image display, for receiving image signals output from the controller, and displaying video content, image content, and menu manipulation interface. Components and user manipulation of the UI interface, etc.
在一些实施例中,显示器260可为液晶显示器、OLED显示器、以及投影显示器中的至少一种,还可以为一种投影装置和投影屏幕。In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.
在一些实施例中,调谐解调器210通过有线或无线接收方式接收广播电视信号,以及从多个无线或有线广播电视信号中解调出音视频信号,如以及EPG数据信号。In some embodiments, the tuner-demodulator 210 receives broadcast TV signals through wired or wireless reception, and demodulates audio and video signals, such as EPG data signals, from multiple wireless or cable broadcast TV signals.
在一些实施例中,外部装置接口240可以包括但不限于如下:高清多媒体接口(HDMI)、模拟或数据高清分量输入接口(分量)、复合视频输入接口(CVBS)、USB输入接口(USB)、RGB端口等任一个或多个接口。也可以是上述多个接口形成的复合性的输入/输出接口。In some embodiments, the external device interface 240 may include, but is not limited to, the following: high-definition multimedia interface (HDMI), analog or data high-definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
在一些实施例中,控制器250通过存储在存储器中的各种软件控制程序,来控制智能设备的工作和响应用户的操作。控制器250控制终端设备200的整体操作。例如:响应于接收到用于选择在显示器260上显示UI对象的用户命令,控制器250便可以执行与由用户命令选择的对象有关的操作。In some embodiments, the controller 250 controls the work of the smart device and responds to user operations through various software control programs stored in the memory. The controller 250 controls overall operations of the terminal device 200 . For example, in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.
在一些实施例中,用户可在显示器260上显示的图形用户界面(GUI)输入用户命令,则用户输入接口通过图形用户界面(GUI)接收用户输入命令。或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户输入接口通过传感器识别出声音或手势,来接收用户输入命令。In some embodiments, the user can input user commands through a graphical user interface (GUI) displayed on the display 260, and the user input interface receives user input commands through the graphical user interface (GUI). Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.
在一些实施例中,终端设备200还与服务器400进行数据通信。可允许终端设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器400可以向终端设备200提供各种内容和互动。服务器400可以是一个集群,也可以是多个集群,可以包括一类或多类服务器机组。In some embodiments, the terminal device 200 also performs data communication with the server 400 . The terminal device 200 may be allowed to communicate via a local area network (LAN), a wireless local area network (WLAN), and other networks. The server 400 may provide various contents and interactions to the terminal device 200 . The server 400 may be one cluster, or multiple clusters, and may include one or more types of server groups.
在一些实施例中,终端设备200-1可以内置语音控制系统,以支持用户的智能语音控制。所述智能语音控制是指用户通过输入语音音频数据来操作终端设备200-1的交互过程。为了实现智能语音控制,终端设备200-1可以包括音频输入装置和音频输出装置。其中, 所述音频输入装置用于采集用户输入的语音音频数据,可以是终端设备200-1内置或外接的麦克风装置。所述音频输出装置用于发出声音,以播放语音响应。例如,如图9所示,当用户通过音频输入装置输入“嗨!小×”等唤醒词时,终端设备200-1可以通过音频输出装置播放“我在”的语音响应,以引导用户完成后续语音输入。In some embodiments, the terminal device 200-1 may have a built-in voice control system to support the user's intelligent voice control. The intelligent voice control refers to an interactive process in which the user operates the terminal device 200-1 by inputting voice and audio data. To implement intelligent voice control, the terminal device 200-1 may include an audio input device and an audio output device. Wherein, the audio input device is used to collect voice and audio data input by the user, and may be a built-in or external microphone device of the terminal device 200-1. The audio output device is used to emit sound to play the voice response. For example, as shown in FIG. 9, when the user inputs a wake-up word such as "Hi! Little ×" through the audio input device, the terminal device 200-1 can play a voice response of "I'm here" through the audio output device to guide the user to complete the follow-up. Voice input.
在一些实施例中,终端设备200内置的智能语音系统还支持一语直达模式,即支持“one-shot”模式。在这种模式下,用户可以通过较少次数的语音输入,直接实现控制功能。例如,在传统模式下用户想要控制终端设备200播放电影资源,则需要先输入语音“嗨,小×”,待终端设备200反馈“我在”后,再输入“我想看电影”,则终端设备200反馈“为您找到以下电影”。而在“one-shot”模式下,用户可以直接输入“嗨!小×,我想看电影”,则终端设备200在接收到语音指令后直接反馈“为您找到以下电影”,减少语音交互次数,提高语音交互效率。In some embodiments, the built-in intelligent voice system of the terminal device 200 also supports a one-language direct mode, that is, supports a "one-shot" mode. In this mode, the user can directly realize the control function through a small number of voice input. For example, in the traditional mode, if the user wants to control the terminal device 200 to play movie resources, he needs to input the voice "Hi, X" first, and then input "I want to watch a movie" after the terminal device 200 feedbacks "I'm here", then The terminal device 200 feeds back "the following movies have been found for you". In the "one-shot" mode, the user can directly input "Hi! X, I want to watch a movie", and the terminal device 200 will directly feed back "find the following movies for you" after receiving the voice command, reducing the number of voice interactions , improve voice interaction efficiency.
对于同一个语音控制系统中的多个终端设备200,用户可以通过智能语音控制多个设备的联动。例如,用户可以通过智能音箱输入语音指令“打开卧室灯”,则智能音箱可以响应于该语音指令,生成用于开启灯光的控制指令,再将控制指令发送给语音控制系统中名称为“卧室”的灯具,以控制开启卧室灯。同时,智能音箱还针对用户的语音输入做出响应,即播放“已为您打开卧室灯”等反馈语音内容。For multiple terminal devices 200 in the same voice control system, the user can control the linkage of multiple devices through intelligent voice. For example, the user can input a voice command "turn on the bedroom light" through the smart speaker, and the smart speaker can respond to the voice command to generate a control command for turning on the light, and then send the control command to the voice control system named "bedroom". lamps to control the turning on of the bedroom lights. At the same time, the smart speaker also responds to the user's voice input, that is, it plays feedback voice content such as "the bedroom light has been turned on for you".
在多个终端设备200之间联动控制时,控制指令可以通过接收到用户语音音频数据的终端设备200-1直接传递给被控设备,也可以通过终端设备200-1传递给路由器等特定的中继设备,再由中继设备传递给被控设备。在一些实施例中,控制指令还可以通过服务器400传递给被控设备。例如,用户在智能家居所在局域网以外,通过智能终端300控制语音控制系统中的某个终端设备200时,智能终端300可以先将控制指令发送给服务器400,服务器400再将控制指令传递给终端设备200,进行控制。During linkage control between multiple terminal devices 200, the control command can be directly transmitted to the controlled device through the terminal device 200-1 that receives the user's voice and audio data, or can be transmitted to a specific intermediate device such as a router through the terminal device 200-1. The relay device, and then passed to the controlled device by the relay device. In some embodiments, the control instruction may also be transmitted to the controlled device through the server 400 . For example, when the user controls a terminal device 200 in the voice control system through the smart terminal 300 outside the local area network where the smart home is located, the smart terminal 300 can first send the control command to the server 400, and the server 400 then transmits the control command to the terminal device 200, for control.
为了实现对语音控制系统中终端设备200的控制,服务器400可以单独向任意终端设备200下发控制指令和相关数据。例如,对于显示设备,用户可以通过交互操作控制显示设备请求在线播放媒资,则服务器400可以根据播放请求,向显示设备反馈媒资数据。而对于针对多个终端设备200的联动控制,服务器400可以统一向语音控制系统下发控制指令和相关数据。例如,当用户智能音箱控制打开卧室灯具时,智能音箱可以将用户输入的控制指令发送给服务器400,服务器400再向语音控制系统下发反馈数据,从而使语音控制系统向卧室灯具发送打开指令,同时向智能音箱反馈控制响应。In order to control the terminal devices 200 in the voice control system, the server 400 can issue control instructions and related data to any terminal device 200 independently. For example, for a display device, the user can control the display device to request online playback of media assets through interactive operations, and the server 400 can feed back media asset data to the display device according to the playback request. As for the linkage control for multiple terminal devices 200, the server 400 can send control instructions and related data to the voice control system in a unified manner. For example, when the user's smart speaker controls to turn on the lamp in the bedroom, the smart speaker can send the control command input by the user to the server 400, and the server 400 sends feedback data to the voice control system, so that the voice control system sends an opening command to the bedroom lamp. At the same time, the control response is fed back to the smart speaker.
语音控制系统中的部分终端设备200可以内置完整的语音控制系统,这类终端设备200可以作为主控制设备,能够独立的接收、处理以及响应,同时能够向其他终端设备200发送语音音频对应的控制指令。例如,显示设备、智能音箱、智能冰箱等终端设备200中可以内置完整的语音控制系统,以接收用户输入的语音音频。语音控制系统中的部分终端设备200可以不内置完整的智能语音系统,仅作为被控制设备接收主控制设备发送的控制指令。例如,灯具、小家电等智能设备,可以接收作为主控制设备的显示设备传递的控制指令,启动、停止运行或更改运行参数。Some terminal devices 200 in the voice control system can have a built-in complete voice control system. This type of terminal device 200 can be used as the main control device, which can receive, process and respond independently, and can send voice and audio corresponding control to other terminal devices 200. instruction. For example, a complete voice control system may be built in terminal devices 200 such as a display device, a smart speaker, and a smart refrigerator, so as to receive voice and audio input by a user. Part of the terminal devices 200 in the voice control system may not have a complete intelligent voice system built in, and only serve as controlled devices to receive control instructions sent by the master control device. For example, smart devices such as lamps and small household appliances can receive control instructions from the display device as the main control device to start, stop or change operating parameters.
由于支持完整语音控制系统的终端设备数量越来越多,因此对于同一个语音控制系统中,可能包括多个支持语音控制系统的终端设备。例如,同一个房间内设置有智能电视、智能音箱以及智能电冰箱,这些终端设备200都内置完整的语音控制系统,可以针对用户输入的语音指令做出响应。但是,对于支持完整语音控制系统的不同终端设备200,其实 际响应语音指令的方式,以及所支持响应的语音指令类型是不同的。例如,如图10所示,对于用户输入的语音指令“我想看电影”,智能电视能够做出响应显示电影列表,并反馈“已为您找到以下电影”的语音内容。而智能音箱和智能电冰箱则无法做出响应,因此会反馈“我听不懂你在说什么”的语音内容。Since the number of terminal devices supporting a complete voice control system is increasing, the same voice control system may include multiple terminal devices supporting the voice control system. For example, a smart TV, a smart speaker, and a smart refrigerator are set in the same room, and these terminal devices 200 all have built-in complete voice control systems, which can respond to voice commands input by users. However, for different terminal devices 200 that support a complete voice control system, the ways of actually responding to voice commands and the types of supported voice commands are different. For example, as shown in Figure 10, for the voice command "I want to watch a movie" input by the user, the smart TV can respond by displaying a list of movies and feedback the voice content of "the following movies have been found for you". Smart speakers and smart refrigerators cannot respond, so they will feedback the voice content of "I can't understand what you are saying".
可见,由于当前语音控制系统中所包含的能够支持语音控制的终端设备200数量有多个,则针对同一个语音指令,会出现多个终端设备200同时唤醒或者误唤醒的情况,导致出现场景混乱,严重影响用户的体验。It can be seen that since the current voice control system includes multiple terminal devices 200 capable of supporting voice control, for the same voice command, multiple terminal devices 200 may wake up at the same time or by mistake, resulting in scene confusion , seriously affecting the user experience.
为了缓解场景混乱的问题,在一些实施中,用户可以根据使用习惯通过智能终端300中的应用程序定义响应设备,并自由切换不同的唤醒策略。例如,用户可以手动设置智能音箱作为主要响应设备,则用户输入的语音指令可以由智能音箱进行响应,并通过智能音箱向其他终端设备200发送控制命令,以实现对整个语音控制系统中终端设备的智能语音控制。In order to alleviate the problem of scene confusion, in some implementations, the user can define a response device through the application program in the smart terminal 300 according to usage habits, and freely switch between different wake-up strategies. For example, the user can manually set the smart speaker as the main response device, then the voice command input by the user can be responded by the smart speaker, and control commands are sent to other terminal devices 200 through the smart speaker, so as to realize the control of the terminal devices in the entire voice control system. Intelligent voice control.
但是,通过用户自定义的方式进行唤醒策略的控制方式,需要用户执行多次手动切换操作,不够智能化。而且,无论切换成哪种唤醒策略,目前多设备唤醒的执行过程都是通过待唤醒设备间相互通信来决策当前哪个终端设备被唤醒。这种执行方式存在很大风险,一方面,当待唤醒设备数量较多时,由于唤醒过程在每一个终端设备200之间都需要进行信息交互,因此无法保证在规定的时间内完成所有终端设备200之间信息的交互,从而导致终端设备200的应答异常。另一方面,由于不同类型终端设备200的唤醒时延不同,即从唤醒到应答的时间不同,因此无法保证不同类型终端设备200在唤醒决策时,能够同时处于设备信息交互时间段内,部分唤醒时延长的终端设备200在设备信息交互时可能还未收到唤醒词,从而错过设备信息交互的时间,导致终端设备200无法针对语音作出响应,出现语音控制异常的问题。However, the method of controlling the wake-up policy in a user-defined manner requires the user to perform multiple manual switching operations, which is not intelligent enough. Moreover, no matter which wake-up strategy is switched to, the current execution process of multi-device wake-up is to determine which terminal device is currently woken up by communicating with each other between the devices to be woken up. There are great risks in this execution method. On the one hand, when there are a large number of devices to be woken up, since the wake-up process requires information interaction between each terminal device 200, it cannot be guaranteed that all terminal devices 200 will be completed within the specified time. The interaction of information between the terminal equipment 200 causes abnormal responses. On the other hand, because the wake-up delays of different types of terminal devices 200 are different, that is, the time from wake-up to response is different, so it cannot be guaranteed that different types of terminal devices 200 can be in the device information interaction time period at the same time when making wake-up decisions, and partially wake up The time-extended terminal device 200 may not have received the wake-up word during device information interaction, thus missing the time for device information interaction, causing the terminal device 200 to be unable to respond to the voice, and the problem of abnormal voice control occurs.
为了缓解语音控制异常的问题,本申请的部分实施例中提供一种语音控制方法,该方法可以应用于语音控制系统。所述语音控制系统包括服务器400和多个终端设备200。其中,所述服务器400应至少包括存储模块410、通信模块420以及控制模块430。存储模块410被配置为存储终端设备200上报的设备状态。通信模块420被配置为与多个终端设备200建立通信连接,以获得终端设备200上报的设备状态以及向多个终端设备200下发控制指令和相关数据。控制模块430则被配置为执行所述语音控制方法中服务器400一侧的程序步骤,以向不同的终端设备200下发响应指令或静默指令。In order to alleviate the problem of abnormal voice control, some embodiments of the present application provide a voice control method, which can be applied to a voice control system. The voice control system includes a server 400 and multiple terminal devices 200 . Wherein, the server 400 should at least include a storage module 410 , a communication module 420 and a control module 430 . The storage module 410 is configured to store the device status reported by the terminal device 200 . The communication module 420 is configured to establish a communication connection with a plurality of terminal devices 200 to obtain device statuses reported by the terminal devices 200 and to issue control instructions and related data to the plurality of terminal devices 200 . The control module 430 is configured to execute the program steps on the side of the server 400 in the voice control method, so as to issue a response command or a silence command to different terminal devices 200 .
同理,为了满足所述语音控制方法的实施,所述语音控制系统中的终端设备200应至少包括音频输入装置、音频输出装置、通信器220以及控制器250。其中,音频输入装置被配置为检测用户输入的语音音频数据。音频输出装置被配置为播放语音响应。通信器220被配置为与服务器400建立通信连接,以向服务器400上报设备状态以及接收服务器400下发的响应指令或静默指令。控制器250则被配置为所述语音控制方法中终端设备200侧运行的程序步骤,以完成智能语音控制过程的响应。Similarly, in order to satisfy the implementation of the voice control method, the terminal device 200 in the voice control system should at least include an audio input device, an audio output device, a communicator 220 and a controller 250 . Wherein, the audio input device is configured to detect voice audio data input by the user. The audio output device is configured to play the spoken response. The communicator 220 is configured to establish a communication connection with the server 400 , so as to report the status of the device to the server 400 and receive a response instruction or a silent instruction issued by the server 400 . The controller 250 is configured as a program step executed on the terminal device 200 side in the voice control method, so as to complete the response of the intelligent voice control process.
如图11、图12所示,所述语音控制方法包括以下内容:As shown in Figure 11 and Figure 12, the voice control method includes the following contents:
终端设备200获取用户输入的语音音频数据。用户处于语音控制系统环境中时,可以实时进行语音输入,则终端设备200内置的音频输入装置可以将用户输入的语音声音信号转化为电信号,并经过降噪、放大、编码、转化等一系列信号处理方法,获得语音音频数据。在进行语音交互时,用户可以通过多种方式输入语音音频数据。即在一些实施例中, 用户可以通过终端设备200内置音频输入装置输入语音音频数据。例如,用户可以通过终端设备200上内置的麦克风设备输入语音“嗨!小×,我想看电影”,则,麦克风可以将该语音声音信号转化为电信号,并传递给控制器250进行后续处理。The terminal device 200 acquires voice and audio data input by the user. When the user is in the environment of the voice control system, he can perform voice input in real time, and the built-in audio input device of the terminal device 200 can convert the voice signal input by the user into an electrical signal, and undergo a series of noise reduction, amplification, encoding, conversion, etc. The signal processing method obtains speech and audio data. During voice interaction, the user can input voice and audio data in various ways. That is, in some embodiments, the user can input voice and audio data through the built-in audio input device of the terminal device 200 . For example, the user can input the voice "Hi! Xiao X, I want to watch a movie" through the built-in microphone device on the terminal device 200, then the microphone can convert the voice signal into an electrical signal, and transmit it to the controller 250 for subsequent processing .
为了触发终端设备200进行智能语音控制,在一些实施例中,用户还可以在输入的语音指令中带有特定的唤醒词。所述唤醒词是一段包含特定内容的语音,如“嗨!小×”、“小×小×”、“嘿!××”等。对于用户输入语音音频数据的过程,尤其是通过终端设备200内置的远场麦克风输入语音音频数据的过程,终端设备200可以对用户输入的语音中是否包含唤醒词进行判断,当检测到唤醒词后,再进行后续处理,以缓解智能语音控制过程的误触发。In order to trigger the terminal device 200 to perform intelligent voice control, in some embodiments, the user may also include a specific wake-up word in the input voice command. The wake-up word is a piece of speech containing specific content, such as "Hi! Xiao×", "Xiao×xiao×", "Hey!××" and so on. For the process of the user inputting voice and audio data, especially the process of inputting voice and audio data through the far-field microphone built in the terminal device 200, the terminal device 200 can judge whether the voice input by the user contains a wake-up word. , and then perform subsequent processing to alleviate false triggering of the intelligent voice control process.
根据声音信号的传输特点,通常语音控制系统距离用户较近的检测到的用户语音音量衰减很小,并且传播距离较近,因此在用户发出声音后,距离用户较近的终端设备200会先检测到用户的语音音频数据。但由于在不同情况下用户输入的语音具体内容不同,因此响应语音的终端设备200是不确定的,即响应语音的终端设备200可能是距离用户距离较近的设备,也可能是距离用户距离较远的设备。例如,当用户在卧室输入“嗨!小×,我想看电影”的语音时,卧室内的智能音箱会先检测到语音音频数据,但智能音箱并不具有视频播放功能,而位于客厅内的智能电视具有视频播放功能。According to the transmission characteristics of the sound signal, usually the sound volume attenuation of the user's voice detected by the voice control system closer to the user is small, and the propagation distance is relatively short, so after the user makes a sound, the terminal device 200 closer to the user will first detect to the user's voice audio data. However, since the specific content of the voice input by the user is different in different situations, the terminal device 200 that responds to the voice is uncertain, that is, the terminal device 200 that responds to the voice may be a device that is closer to the user, or may be a device that is farther away from the user. remote device. For example, when a user enters the voice of "Hi! X, I want to watch a movie" in the bedroom, the smart speaker in the bedroom will first detect the voice and audio data, but the smart speaker does not have a video playback function, while the smart speaker in the living room Smart TVs have video playback capabilities.
因此,为了响应当前用户语音,在获取用户输入的语音音频数据后,终端设备200会根据语音音频数据生成语音指令。其中,语音指令是一种控制命令,具有特定的指令格式,包括控制动作函数、控制对象代码等内容。在终端设备200接收到语音音频数据后,终端设备200可以通过智能语音系统中的语音处理模块,先对语音音频数据进行文本转化,即通过声学特征提取,将语音音频数据中的波形数据转化为文本数据。Therefore, in order to respond to the current user's voice, after acquiring the voice and audio data input by the user, the terminal device 200 will generate a voice instruction according to the voice and audio data. Among them, the voice command is a control command, which has a specific command format, including control action functions, control object codes, and the like. After the terminal device 200 receives the voice and audio data, the terminal device 200 can first convert the voice and audio data into text through the voice processing module in the intelligent voice system, that is, convert the waveform data in the voice and audio data into text data.
在转化为文本数据后,终端设备200可以使用分词工具,将非结构化的文本数据转化为结构化文本数据。即终端设备200可以通过词库匹配等方式,剔除文本数据中的语气词、助词等无实际意义的文本内容,保留文本数据中的关键词,并按照词义将多个关键词进行分隔,以获得结构化的文本。After being converted into text data, the terminal device 200 can use a word segmentation tool to convert unstructured text data into structured text data. That is, the terminal device 200 can remove meaningless text content such as modal particles and auxiliary words in the text data by means of thesaurus matching, retain keywords in the text data, and separate multiple keywords according to word meanings to obtain structured text.
在获得结构化文本数据后,终端设备200还可以将结构化文本输入文字处理模型。文字处理模型是一种基于机器学习的人工智能模型。文字处理模型可以在输入文本数据后,经过计算确定文本信息归属于特定语义的分类概率。因此,可以通过将各种标准控制指令作为分类标签,使文字处理模型能够输出文本数据对每个标准控制指令的分类概率,其中,分类概率最高的标准控制指令即是语音音频数据对应的控制指令。After obtaining the structured text data, the terminal device 200 may also input the structured text into the word processing model. The word processing model is an artificial intelligence model based on machine learning. After the text data is input, the word processing model can calculate and determine the classification probability that the text information belongs to a specific semantic meaning. Therefore, by using various standard control instructions as classification labels, the word processing model can output the classification probability of text data for each standard control instruction, where the standard control instruction with the highest classification probability is the control instruction corresponding to the voice and audio data .
文字处理模型可以利用样本数据和设置的输入输出规则,对初始模型进行反复训练获得。其中,样本数据为带有标签的文本信息。在模型训练的过程中,可以样本数据为输入,以分类概率为输出,对样本数据执行计算。并将输出的结果与样本数据中的标签进行比较,获得训练误差,再将训练误差反向传播,即根据训练误差调整模型参数,从而经过反复多次的大量样本数据输入,可以获得能够准确输出识别结果的文字处理模型。The word processing model can be obtained by repeatedly training the initial model by using sample data and set input and output rules. Among them, the sample data is text information with labels. In the process of model training, the sample data can be used as the input and the classification probability can be used as the output to perform calculations on the sample data. And compare the output result with the label in the sample data to obtain the training error, and then backpropagate the training error, that is, adjust the model parameters according to the training error, so that after repeated input of a large number of sample data, an accurate output can be obtained Word processing model for recognition results.
经过模型计算后,终端设备200可以将用户输入的语音音频数据转化为语音指令。经过终端设备200的转化后,被控设备或者服务器400可以在接收到语音指令后直接对语音指令进行处理,如根据语音指令执行控制动作以及在语音指令中提取业务需求信息等。After the model calculation, the terminal device 200 can convert the voice and audio data input by the user into voice instructions. After conversion by the terminal device 200, the controlled device or the server 400 can directly process the voice command after receiving the voice command, such as executing control actions according to the voice command and extracting service requirement information from the voice command.
显然,在一些实施例中,终端设备200可以直接将语音音频数据作为语音指令进行发送,即对于数据处理能力较低,或者没有内置完整语音控制系统的终端设备200,终端设 备200可以直接将音频数据进行转发,由服务器400或者其他终端设备200执行语言处理,以缓解当前终端设备200的运算负荷。Obviously, in some embodiments, the terminal device 200 can directly send voice and audio data as a voice command, that is, for a terminal device 200 with low data processing capability or without a built-in complete voice control system, the terminal device 200 can directly send the audio data to The data is forwarded, and the server 400 or other terminal devices 200 perform language processing, so as to alleviate the computing load of the current terminal device 200 .
在生成语音指令后,终端设备200可以向服务器400发送语音指令,以触发服务器400对多个终端设备200的唤醒过程执行控制。需要说明的是,由于语音控制系统中可以包括多个内置语音控制系统的终端设备200,因此当用户输入语音时,语音控制系统中的多个终端设备200均能够检测到语音音频数据,此时,为了避免重复的数据传输,服务器400可以在接收到一个语音指令后,暂停其他终端设备200中语音指令的生成过程和语音指令的上报过程。After generating the voice command, the terminal device 200 may send the voice command to the server 400 to trigger the server 400 to perform control on the wake-up process of multiple terminal devices 200 . It should be noted that since the voice control system may include multiple terminal devices 200 with built-in voice control systems, when the user inputs voice, the multiple terminal devices 200 in the voice control system can all detect voice and audio data. , in order to avoid repeated data transmission, the server 400 may suspend the voice command generation process and the voice command reporting process in other terminal devices 200 after receiving a voice command.
例如,在智能电视向服务器400发送语音指令后,服务器400可以向智能电视所在语音控制系统中的智能音箱和智能冰箱发送用于暂停指令生成和指令发送的控制指令,则在接收到该控制指令后,智能音箱和智能冰箱均停止生成语音指令的生成和发送。由于数据处理能力较高的终端设备200通常能够在较短的时间内完成语音音频数据计算,从而先于其他设备完成语音指令的生成。因此,服务器400在接收到最先发送的语音指令后,停止其他终端设备200的语音指令生成和上报过程,还可以缩短语音指令的生成时间,提高语音响应速度。For example, after the smart TV sends a voice command to the server 400, the server 400 can send a control command for suspending command generation and command transmission to the smart speakers and smart refrigerators in the voice control system where the smart TV is located, then after receiving the control command After that, both the smart speaker and the smart refrigerator stop generating and sending voice commands. Since the terminal device 200 with higher data processing capability can usually complete the voice and audio data calculation in a shorter time, it can complete the generation of the voice instruction before other devices. Therefore, after receiving the voice command sent first, the server 400 stops the voice command generation and reporting process of other terminal devices 200, which can also shorten the voice command generation time and improve the voice response speed.
服务器400接收到语音指令后,可以在语音指令中解析业务需求信息。对于用户输入的不同语音指令,其中包含的控制内容也不同,则其拥有不同的业务需求。例如,当用户输入语音“嗨!小×,我想听音乐”时,则经过终端设备200的处理后,生成语音指令,且语音指令中包含“播放音乐”(music_play)的业务需求。当用户输入语音“嗨!小×,打开卧室灯”时,生成包含“打开灯具”(light_power on)业务需求的语音指令。After receiving the voice command, the server 400 may analyze the service requirement information in the voice command. For different voice commands input by the user, the control content contained therein is also different, so they have different service requirements. For example, when the user inputs the voice "Hi! X, I want to listen to music", after processing by the terminal device 200, a voice command is generated, and the voice command includes the service requirement of "play music" (music_play). When the user enters the voice "Hi! X, turn on the bedroom light", a voice command containing the business requirement of "turn on the lamp" (light_power on) is generated.
显然,当语音指令中包含业务需求信息时,服务器400可以直接从语音指令汇总提取业务需求信息。而当语音指令为终端设备200上传的语音音频数据时,服务器400还可以对终端设备200上传的语音音频数据进行识别处理,即如上述实施例中终端设备200对语音音频数据执行的处理方式相同,服务器400也可以通过内置的语音转文字工具、文本结构化处理工具以及文字处理模型等对语音音频数据进行识别,以从中识别出业务需求信息。Apparently, when the voice instruction contains service requirement information, the server 400 can directly extract the service requirement information from the summary of the voice instruction. And when the voice command is the voice and audio data uploaded by the terminal device 200, the server 400 can also identify and process the voice and audio data uploaded by the terminal device 200, that is, the processing method performed by the terminal device 200 on the voice and audio data in the above-mentioned embodiment is the same In other words, the server 400 can also recognize the voice and audio data through built-in speech-to-text tools, text structured processing tools, and word processing models, so as to identify business requirement information therefrom.
为了便于服务器400从语音指令中解析业务需求信息,在一些实施例中,可以设置业务需求识别模型,或者将上述文字处理模型的输出分类设置为业务需求,以通过模型计算出用户语音音频数据对于各业务需求的分类概率。In order to facilitate the server 400 to parse the business requirement information from voice instructions, in some embodiments, a business requirement recognition model can be set, or the output classification of the above-mentioned word processing model can be set as a business requirement, so as to calculate the user voice and audio data through the model. Classification probability for each business requirement.
需要说明的是,由于用户输入的语音内容可能包含多个用户意图,因此从对应的语音指令中也可以解析出多个业务需求。例如,用户输入语音“嗨!小×,打开客厅灯和播放电影”,则可以在语音指令中解析出“打开灯具”和“播放电影”两个业务需求。此外,语音控制系统还可以通过预置更加丰富的指令集,实现更丰富的语音交互功能,则根据设定的指令集,可以对应确定其中包含的业务需求。例如,用户输入语音“嗨!小×,开启影院模式”,则语音控制系统可以根据“影院模式”的指令集,确定对语音控制系统的控制内容包括播放电影同时关灯,以模仿电影院的氛围。因此,服务器400可以在语音指令中解析出“关闭灯具”和“播放电影”两个业务需求。It should be noted that since the voice content input by the user may contain multiple user intentions, multiple service requirements may also be parsed from the corresponding voice instructions. For example, if the user inputs the voice "Hi! X, turn on the light in the living room and play a movie", the two business requirements of "turn on the light" and "play a movie" can be parsed out from the voice command. In addition, the voice control system can also realize richer voice interaction functions by presetting a richer instruction set, and then according to the set instruction set, the business requirements contained in it can be determined correspondingly. For example, if the user inputs the voice "Hi! X, turn on theater mode", the voice control system can determine the control content of the voice control system according to the instruction set of "theater mode", including playing a movie and turning off the lights at the same time, so as to imitate the atmosphere of a movie theater . Therefore, the server 400 can parse out the two service requirements of "turn off the lamp" and "play a movie" from the voice command.
不同的业务需求,对应为终端设备200执行不同的控制操作,对应需要响应该语音指令的终端设备200需要处于不同的设备状态。例如,对于灯具,其在处于待机状态才能够支持开启/关闭、亮度调节等控制;而当用户通过墙壁开关关闭灯具的供电,使其处于离线状态时,则不能支持开启/关闭、亮度调节等控制。Different service requirements correspond to different control operations performed by the terminal device 200, and correspond to different device states for the terminal device 200 that needs to respond to the voice command. For example, for a lamp, it can only support on/off, brightness adjustment and other controls when it is in the standby state; when the user turns off the power supply of the lamp through the wall switch and makes it offline, it cannot support on/off, brightness adjustment, etc. control.
因此,终端设备200可以通过预定的信息上报策略,向服务器400上报设备状态。在一些实施例中,终端设备200可以按照数据更新频率,每个特定的时间向服务器400上报一次当前设备状态,并且,服务器400可以根据终端设备200的上报状态,更新存储的设备状态。Therefore, the terminal device 200 can report the device status to the server 400 through a predetermined information reporting strategy. In some embodiments, the terminal device 200 may report the current device status to the server 400 every specific time according to the data update frequency, and the server 400 may update the stored device status according to the reported status of the terminal device 200 .
例如,服务器400可以向终端设备200发送心跳指令,终端设备200则在接收到心跳指令后,向服务器400反馈当前设备状态,以使服务器400可以更新存储的设备状态。而当服务器400向终端设备200发送心跳指令的预设周期内,终端设备200没有向服务器400反馈心跳指令时,服务器400可以将对应的设备状态更新为离线状态。For example, the server 400 may send a heartbeat command to the terminal device 200, and the terminal device 200 may feed back the current device status to the server 400 after receiving the heartbeat command, so that the server 400 may update the stored device status. And when the server 400 sends a heartbeat command to the terminal device 200 within a preset period, and the terminal device 200 does not feed back a heartbeat command to the server 400, the server 400 may update the corresponding device status to an offline status.
为了使语音交互过程中所依据的设备状态为更有效的设备状态,在一些实施例中,终端设备200的设备状态还可以通过语音指令触发上报。即服务器400可以获取语音指令对应的语音音频数据,并从语音音频数据中识别唤醒词。如果语音音频数据中包括唤醒词,则定位终端设备200所在的语音控制系统,从而向语音控制系统发送状态获取请求。语音控制系统中的全部终端设备200可以在接收到状态获取指令后,上报设备状态。In order to make the device state based on the voice interaction process a more effective device state, in some embodiments, the device state of the terminal device 200 may also be triggered to be reported through a voice command. That is, the server 400 may acquire the voice and audio data corresponding to the voice command, and recognize the wake-up word from the voice and audio data. If the voice and audio data includes the wake-up word, locate the voice control system where the terminal device 200 is located, so as to send a status acquisition request to the voice control system. All terminal devices 200 in the voice control system may report the device status after receiving the status acquisition instruction.
例如,在终端设备200上报语音音频数据时,服务器400可以从语音音频数据中识别唤醒词“嗨!小×”,则在语音音频数据中识别出唤醒词“嗨!小×”后,服务器400可以根据终端设备200的识别信息确定用户当前使用的语音控制系统,即“××的家居系统”,该语音控制系统的客厅中有智能电视、音箱A、音箱B;卧室中有灯、音箱C;厨房中有智能冰箱。再向该语音控制系统发送状态获取请求,以使该语音控制系统中的电视、音箱A、音箱B、灯、音箱C、智能冰箱上报当前的设备状态。For example, when the terminal device 200 reports voice and audio data, the server 400 can recognize the wake-up word "Hi! Xiao×" from the voice and audio data, then after recognizing the wake-up word "Hi! Xiao×" in the voice and audio data, the server 400 The voice control system currently used by the user can be determined according to the identification information of the terminal device 200, that is, "××'s home system". The voice control system has a smart TV, speaker A, and speaker B in the living room; there are lights and speaker C in the bedroom. ; There is a smart refrigerator in the kitchen. Then send a status acquisition request to the voice control system, so that the TV, speaker A, speaker B, lamp, speaker C, and smart refrigerator in the voice control system report the current device status.
在获得业务需求信息和终端设备200上报的设备状态后,服务器400可以根据业务需求信息和设备状态信息筛选第二终端设备。其中,所述第二终端设备为设备状态能够实现业务需求信息的智能设备。After obtaining the service requirement information and the device status reported by the terminal device 200, the server 400 may screen the second terminal device according to the service requirement information and the device status information. Wherein, the second terminal device is an intelligent device whose device status can realize service requirement information.
由于终端设备200能否实现业务需求需要特定的前置条件,如设备类型和设备状态,因此服务器400在筛选第二终端设备的过程中,可以根据不同的前置条件对当前语音控制系统中的终端设备200进行多层次筛选。例如,用户输入语音“嗨!小×,打开灯”,则对应的业务需求为“打开灯具”,其实现业务需求所需要的前置条件分别为,设备类型为灯具,设备状态为待机状态,则服务器400可以先根据设备类型在当前语音控制系统中筛选出所有类型为灯具的终端设备200,再根据设备状态筛选出设备状态为待机状态的灯具,作为第二终端设备。Since whether the terminal device 200 can meet the business requirements requires specific preconditions, such as device type and device status, so the server 400 can filter the current voice control system according to different preconditions in the process of screening the second terminal device. The terminal device 200 performs multi-level screening. For example, if the user enters the voice "Hi! Little ×, turn on the light", the corresponding business requirement is "turn on the light". The preconditions required to realize the business requirement are: the device type is a light fixture, and the device status is standby , the server 400 can first filter out all terminal devices 200 whose type is a lamp in the current voice control system according to the device type, and then filter out lamps whose device status is in a standby state according to the device status as the second terminal device.
在筛选出第二终端设备后,服务器400向作为第二终端设备的终端设备200发送响应指令,则作为第二终端设备的终端设备200可以通过运行响应指令,以响应语音控制功能。同时,服务器400还向当前语音控制系统中第二终端设备以外的其他终端设备200发送静默指令,以使当前语音控制系统中第二终端设备以外的其他智能设备可以通过运行静默指令,不响应语音控制功能。After the second terminal device is screened out, the server 400 sends a response command to the terminal device 200 as the second terminal device, and the terminal device 200 as the second terminal device can respond to the voice control function by running the response command. At the same time, the server 400 also sends a silent command to other terminal devices 200 other than the second terminal device in the current voice control system, so that other smart devices in the current voice control system other than the second terminal device can run the silent command without responding to the voice. control function.
例如,用户输入语音“嗨!小×,打开灯”,则家庭环境内的支持语音交互的终端设备200将收到的语音指令及设备状态上报给服务器400,即,将语音指令“打开灯具”和设备状态(待机)上报给服务器400。服务器400端收到语音指令后,可以判断有一台灯具当前的设备状态符合当前用户语音指令中业务需求对应的对象范畴。因此,服务器400可以下发用于唤醒该灯具的响应指令,同时下发静默指令给其它设备,以使设备端执行对应指令,使符合业务需求和设备状态灯具被点亮,而不符合业务需求和设备状态的其他终端设备200保持静默状态。For example, if the user inputs the voice "Hi! X, turn on the light", the terminal device 200 supporting voice interaction in the home environment will report the received voice command and device status to the server 400, that is, the voice command "Turn on the light " and the device status (standby) are reported to the server 400. After receiving the voice command, the server 400 can determine that the current device status of a lamp meets the object category corresponding to the service requirement in the current user voice command. Therefore, the server 400 can issue a response command for waking up the lamp, and at the same time issue a silent command to other devices, so that the device side executes the corresponding command, so that the lamps that meet the business requirements and device status are turned on, but do not meet the business requirements. and other terminal devices 200 in the device state remain silent.
由以上内容可知,上述实施例中提供的语音控制方法,可以使用语音指令中包含的业务需求信息和终端设备200上报的设备状态,在当前语音控制系统中筛选出能够响应语音指令的第二终端设备。并向第二终端设备发送响应指令,同时向其他设备发送静默指令,使得智能语音系统可以在接收到用户输入语音指令后,各终端设备200会通过分别和服务器400之间的通信来交互信息,通过服务器400自动判断第二终端设备,减少多个终端设备200之间的数据交互,以缓解多设备间通信频繁导致执行率低的问题。It can be seen from the above that the voice control method provided in the above embodiment can use the service requirement information contained in the voice command and the device status reported by the terminal device 200 to screen out the second terminal capable of responding to the voice command in the current voice control system equipment. And send a response command to the second terminal device, and send a silent command to other devices at the same time, so that after the intelligent voice system receives the voice command input by the user, each terminal device 200 will exchange information through communication with the server 400 respectively, The second terminal device is automatically judged by the server 400 to reduce data interaction between multiple terminal devices 200, so as to alleviate the problem of low execution rate caused by frequent communication between multiple devices.
用户在输入语音时,可以在语音指令中明确执行设备,例如,用户输入的语音内容为“打开电视”,则其中明确了执行设备为电视,此时,由于具有明确的执行设备,服务器400可以将语音指令直接传递给电视设备,无需再解析业务需求信息即可确定执行设备。因此,在一些实施例中,在服务器400接收到终端设备200-1上报的语音指令后,还可以对语音指令中的执行设备进行检测。如果语音指令中没有明确执行设备,则按照上述实施例中提供的方式,通过解析业务需求信息,并与设备状态进行匹配筛选第二终端设备。When the user inputs the voice, the device can be explicitly executed in the voice instruction. For example, if the voice content input by the user is "turn on the TV", the executing device is specified as the TV. At this time, since there is a clear executing device, the server 400 can The voice command is directly transmitted to the TV device, and the execution device can be determined without parsing the business requirement information. Therefore, in some embodiments, after the server 400 receives the voice command reported by the terminal device 200-1, it may also detect the executing device in the voice command. If there is no specific execution device in the voice instruction, the second terminal device is screened by parsing the service requirement information and matching with the device status according to the manner provided in the above-mentioned embodiment.
如果语音指令中明确了执行设备,即包括执行设备的识别信息,则可以根据语音指令生成控制命令和反馈语音信息。其中,控制命令为与语音指令相对应,面向执行设备的一种命令。例如,对于用户输入的“打开电视”语音,对应生成的控制命令为“TV_power on”。反馈语音信息是针对语音内容发出的一种语音音频,用于提示用户指令的执行结果。例如,当用户输入语音“打开电视”后,智能语音系统在将电视开机后,会播放“已为您打开电视”的反馈语音信息。If the execution device is specified in the voice instruction, that is, the identification information of the execution device is included, the control command and feedback voice information can be generated according to the voice instruction. Wherein, the control command is a command corresponding to the voice command and oriented to the execution device. For example, for the "turn on the TV" voice input by the user, the corresponding generated control command is "TV_power on". Feedback voice information is a kind of voice audio sent out for the voice content, which is used to remind the user of the execution result of the instruction. For example, when the user enters the voice "Turn on the TV", the intelligent voice system will play the feedback voice information of "Turn on the TV for you" after turning on the TV.
控制命令和反馈语音信息可以发送给特定终端设备200,以分别通过执行控制命令实现控制指令对应的业务,以及通过播放反馈语音信息,提示用户业务执行结果。控制命令和反馈语音信息可以均作用于执行设备,例如,在用户输入“打开电视”的语音时,电视响应该语音上电开机,同时通过电视的智能语音系统和扬声器播放“已问您打开电视”的语音反馈。The control command and the feedback voice information can be sent to the specific terminal device 200, so as to implement the service corresponding to the control command by executing the control command, and prompt the user of the service execution result by playing the feedback voice information. Both the control command and the feedback voice information can act on the execution device. For example, when the user enters the voice of "turn on the TV", the TV responds to the voice to power on and start up, and at the same time, "I have been asked to turn on the TV" is played through the TV's intelligent voice system and speakers. ” voice feedback.
但由于执行设备可能位于距离用户较远的位置,此时如果通过执行设备播放反馈语音信息,则由于距离较远,而出现用户无法听清反馈语音内容的情况,造成用户无法知晓语音交互过程的控制结果。并且,当家中有多台智能语音控制设备时,很多时候用户并不关心唤醒的是哪台设备来反馈执行结果。为此,在一些实施例中,服务器400可以分别向不同的终端设备200发送控制命令和反馈语音信息。即服务器400可以按照执行设备的识别信息将控制命令发送给执行设备,并将反馈语音信息发送给输入语音指令的智能设备。However, since the execution device may be located far away from the user, if the feedback voice information is played through the execution device at this time, the user may not be able to hear the feedback voice content clearly due to the long distance, resulting in the user being unable to know the voice interaction process. control the outcome. Moreover, when there are multiple intelligent voice control devices at home, users often don't care which device is woken up to feedback the execution result. To this end, in some embodiments, the server 400 may send control commands and feedback voice information to different terminal devices 200 respectively. That is, the server 400 may send the control command to the execution device according to the identification information of the execution device, and send the feedback voice information to the smart device that inputs the voice command.
例如,当用户在卧室发出语音“打开电视”,卧室中带有智能语音系统的智能空调先检测到语音音频数据,并生成语音指令发送给服务器400,服务器400则根据语音指令可以确定电视为执行设备,并根据语音指令生成“TV_power on”的控制命令以及“已为您打开电视”的反馈语音信息。再将控制命令发送给客厅内的电视,以打开电视,以及将反馈语音信息发送给智能空调,以通过卧室中的智能空调播放“已为您打开电视”的语音反馈。For example, when the user utters the voice "turn on the TV" in the bedroom, the intelligent air conditioner with the intelligent voice system in the bedroom first detects the voice audio data, and generates a voice command and sends it to the server 400, and the server 400 can determine that the TV is turned on according to the voice command. device, and generate the "TV_power on" control command and the feedback voice information of "Turn on the TV for you" according to the voice command. Then send the control command to the TV in the living room to turn on the TV, and send the feedback voice information to the smart air conditioner to play the voice feedback of "Turn on the TV for you" through the smart air conditioner in the bedroom.
可见,在上述实施例中,当语音指令中有明确的执行设备时,可以通过执行设备和输入语音指令的终端设备200分别对语音指令进行响应,从而在满足业务需求的同时,给予用户更好的反馈效果。It can be seen that in the above embodiment, when there is a clear execution device in the voice command, the voice command can be responded to by the execution device and the terminal device 200 that inputs the voice command, so as to meet the business needs and give the user better feedback effect.
由于语音控制系统中可能包含多个终端设备200,并且不同的终端设备200可以支持相同的业务需要,并且在同一时间内处于相同的设备状态,因此,通过上述实施例中的方式对设备进行筛选时,可能筛选出多个第二终端设备。此时,如果服务器400直接向作为第 二终端设备的终端设备200发送响应指令,则将导致多个终端设备200同时对一个语音指令执行响应,仍旧存在场景混乱的问题。Since the voice control system may contain multiple terminal devices 200, and different terminal devices 200 can support the same business needs and be in the same device state at the same time, the devices are screened through the methods in the above-mentioned embodiments , multiple second terminal devices may be screened out. At this time, if the server 400 directly sends a response command to the terminal device 200 as the second terminal device, it will cause multiple terminal devices 200 to respond to a voice command at the same time, and there is still the problem of scene confusion.
对此,服务器400可以通过增加筛选条件的方式进一步进行细致的筛选过程,以减少作为第二终端设备的终端设备200的数量。即在一些实施例中,所述业务需求信息可以进一步包括业务类型和业务状态。则服务器400在根据业务需求信息筛选第二终端设备时,可以从业务需求信息中提取业务类型和业务状态,并在设备状态中匹配满足业务类型的候选设备,其中,所述候选设备具有符合业务类型需要的设备类型,再通过遍历候选设备的设备状态,以筛选出设备状态符合业务状态的第二终端设备。In this regard, the server 400 may further perform a detailed screening process by adding screening conditions, so as to reduce the number of terminal devices 200 serving as the second terminal devices. That is, in some embodiments, the service requirement information may further include service type and service status. Then, when the server 400 screens the second terminal device according to the service requirement information, it may extract the service type and service state from the service requirement information, and match the candidate device meeting the service type in the device state, wherein the candidate device has a service The device type required by the type, and then by traversing the device states of the candidate devices, to filter out the second terminal device whose device state conforms to the service state.
例如,用户输入语音“嗨!小×,关闭音乐”,则家庭环境内的终端设备200将收到的语音指令、当前的设备类型及设备状态上报给云端服务器400,即设备类型(音乐)及设备状态(播放中)。服务器400在收到终端设备200上报的内容后,可以根据语音指令中需求的业务类型和业务状态对当前语音控制系统中对应的设备类型和设备状态进行筛选,判断有一台处于音乐播放中的音箱当前设备类型及设备状态符合当前用户语音指令的对象范畴。因此,服务器400可以下发响应指令给对应的音箱,同时下发静默指令给其它设备,以使当前语音控制系统中的音箱设备执行对应的响应指令,执行关闭音乐操作。For example, if the user inputs the voice "Hi! X, turn off the music", the terminal device 200 in the home environment will report the received voice command, the current device type and device status to the cloud server 400, that is, the device type (music) and Device status (playing). After receiving the content reported by the terminal device 200, the server 400 can filter the corresponding device type and device status in the current voice control system according to the service type and service status required in the voice command, and determine that there is a speaker that is playing music. The current device type and device status conform to the object category of the current user's voice command. Therefore, the server 400 can send a response command to the corresponding speaker, and send a silence command to other devices at the same time, so that the speaker device in the current voice control system executes the corresponding response command and performs the operation of turning off the music.
在一些实施例中,所述业务需求信息还包括业务执行位置,服务器400可以根据业务执行位置对终端设备200进行进一步筛选,以确定第二终端设备。即服务器400可以在根据业务需求信息筛选第二终端设备时,从业务需求信息中提取业务执行位置,并获取当前语音控制系统中各候选设备的设备位置;如果候选设备的设备位置与业务执行位置重合,即候选设备满足业务执行位置,则可以执行遍历候选设备的设备状态的步骤,以筛选出设备状态符合业务状态的第二终端设备。如果候选设备的设备位置与业务执行位置不重合,标记候选设备不是第二终端设备,即可以从候选设备列表中删除该设备。In some embodiments, the service requirement information further includes a service execution location, and the server 400 may further screen the terminal device 200 according to the service execution location to determine the second terminal device. That is, the server 400 can extract the service execution location from the service requirement information when screening the second terminal device according to the service requirement information, and obtain the device locations of each candidate device in the current voice control system; if the device location of the candidate device is consistent with the service execution location coincidence, that is, the candidate device satisfies the service execution position, then the step of traversing the device states of the candidate devices may be performed to screen out the second terminal device whose device state meets the service state. If the device location of the candidate device does not coincide with the service execution location, it is marked that the candidate device is not the second terminal device, that is, the device can be deleted from the candidate device list.
例如,用户输入语音“嗨!小×,让卧室的音箱播放音乐”,则家庭环境内的终端设备200将收到的用户指令,并将当前的设备类型及设备状态上报云端服务器,即设备类型(无)及设备状态(待机)。服务器400接收到终端设备200上报的信息后,可以从语音指令中解析出业务执行位置“卧室”,并按照该业务执行位置对当前语音控制系统中的终端设备200进行筛选,确定设备位置处于卧室范围内的终端设备200。因此,服务器400可以在判断有一台位于卧室中的音箱对应当前设备类型及设备状态符合当前用户指令控制的对象范畴时,下发响应指令给卧室中的音响。同时,服务器400还下发静默指令给当前语音控制系统中的其它设备,包括卧室中的设备和卧室以外的设备。For example, if the user inputs the voice "Hi! X, let the speaker in the bedroom play music", the terminal device 200 in the home environment will receive the user instruction and report the current device type and device status to the cloud server, that is, the device type (none) and device status (standby). After receiving the information reported by the terminal device 200, the server 400 can analyze the service execution location "bedroom" from the voice command, and screen the terminal device 200 in the current voice control system according to the service execution location, and determine that the device location is in the bedroom The terminal device 200 within the range. Therefore, the server 400 may issue a response instruction to the speakers in the bedroom when it is determined that a speaker in the bedroom corresponds to the current device type and the device status meets the object category controlled by the current user instruction. At the same time, the server 400 also sends a silent command to other devices in the current voice control system, including devices in the bedroom and devices outside the bedroom.
由以上内容可知,上述实施例中提供的语音控制方法可以基于业务类型、业务状态、业务执行位置等业务需求信息,对语音控制系统中的终端设备200进行多轮筛选,从而确定数量较少的第二终端设备,以减少终端设备200之间的通信频率,提高智能语音控制过程的执行效率。It can be seen from the above that the voice control method provided in the above embodiment can perform multiple rounds of screening on the terminal devices 200 in the voice control system based on service requirement information such as service type, service status, and service execution location, so as to determine a small number of terminal devices 200. The second terminal device is used to reduce the communication frequency between the terminal devices 200 and improve the execution efficiency of the intelligent voice control process.
经过上述实施例中提供的筛选过程,服务器400可以在众多终端设备200中筛选出可响应控制指令的第二终端设备。通过上述筛选过程,虽然可以大大减少作为第二终端设备的终端设备200数量,但是在部分筛选过程中,能够满足业务需求的终端设备200仍然存在多个,而对于用户的语音控制过程,通常只需要特定的一个或多个第二终端设备执行响应。Through the screening process provided in the above embodiments, the server 400 can screen out the second terminal device that can respond to the control instruction from among the many terminal devices 200 . Through the above screening process, although the number of terminal devices 200 as the second terminal devices can be greatly reduced, there are still many terminal devices 200 that can meet the service requirements in the partial screening process, and for the user's voice control process, usually only Specific one or more second terminal devices are required to perform the response.
因此,如图13所示,为了确定最终执行响应的第二终端设备,在一些实施中,服务器400还可以从筛选出的能够满足业务需求的多个终端设备200中进一步确定最终执行设备。 即在根据业务需求信息筛选第二终端设备时,服务器400可以执行以下步骤:Therefore, as shown in FIG. 13 , in order to determine the second terminal device that finally executes the response, in some implementations, the server 400 may further determine the final execution device from among the screened multiple terminal devices 200 that can meet service requirements. That is, when screening the second terminal device according to the service requirement information, the server 400 may perform the following steps:
S201:获取设备状态能够实现业务需求信息的终端设备数量。S201: Obtain the number of terminal devices whose device status can realize service requirement information.
S202:如果终端设备数量等于1,即当前语音控制系统中只有一个能够满足当前业务需求的终端设备200,因此服务器400可以直接标记能够实现业务需求信息的终端设备200为第二终端设备。S202: If the number of terminal devices is equal to 1, that is, there is only one terminal device 200 that can meet the current service demand in the current voice control system, so the server 400 can directly mark the terminal device 200 that can realize the service demand information as the second terminal device.
S203:如果终端设备数量大于或等于2,查找主设备。其中,所述主设备为能够实现业务需求信息的多个终端设备中的一个。S203: If the number of terminal devices is greater than or equal to 2, search for a master device. Wherein, the master device is one of multiple terminal devices capable of realizing service requirement information.
S204:主设备可以执行与用户的进行进一步交互,以确定最终响应语音指令的第二终端设备。S204: The main device may perform further interaction with the user to determine the second terminal device that finally responds to the voice instruction.
即在一些实施例中,服务器400可以在查找主设备后,向主设备发送问询指令,以使主设备播放询问语音,其中,所述问询指令为多轮免唤醒语音交互指令。再接收用户通过主设备输入的确认语音指令,并从确认语音指令中提取第二终端设备识别信息,以根据第二终端设备识别信息在能够实现业务需求信息的多个智能设备中筛选第二终端设备。That is, in some embodiments, after finding the master device, the server 400 may send an inquiry instruction to the master device, so that the master device plays an inquiry voice, wherein the inquiry instruction is multiple rounds of wake-up-free voice interaction instructions. Then receive the confirmation voice command input by the user through the main device, and extract the identification information of the second terminal device from the confirmation voice command, so as to screen the second terminal among multiple smart devices that can realize business demand information according to the identification information of the second terminal device equipment.
例如,当用户所处的环境中包括音箱A和音箱B两个正在播放音乐的终端设备200,则在用户输入语音:“嗨!小×,关闭音乐”时,音箱A和音箱B会在接收到用户输入的语音指令后,分别上报给云端服务器400当前各自的设备类型(音乐)和设备状态(播放中)。服务器400在接收到上述内容后,可以根据语音指令中的业务需求筛选出满足该业务需求的终端设备200。即判断有两台音箱当前的设备类型及设备状态与需求的业务类型和业务状态一致后,指定音箱A作为主设备,并向音箱A下发多轮免唤醒问询指令,即“您有音箱A、音箱B两台设备,请问您需要关闭哪一个?”,再接收用户反馈的确认语音指令,即在用户回复语音:“关闭音箱A的音乐”,则确定音箱A为最终执行语音控制响应的第二终端设备。此时,服务器400可以向音箱A发送响应指令,向包括音箱B的其他终端设备200发送静默指令。For example, when the user's environment includes two terminal devices 200 that are playing music, speaker A and speaker B, when the user inputs the voice: "Hi! X, turn off the music", speaker A and speaker B will receive After the voice command input by the user is received, the current respective device type (music) and device status (playing) are reported to the cloud server 400 respectively. After receiving the above content, the server 400 can filter out the terminal device 200 that meets the service requirement according to the service requirement in the voice command. That is, after judging that the current device type and device status of the two speakers are consistent with the required service type and service status, designate speaker A as the master device, and send multiple rounds of wake-up-free inquiry commands to speaker A, that is, "Do you have a speaker There are two devices, A and speaker B, which one do you want to turn off?", and then receive the confirmation voice command from the user, that is, when the user replies with the voice: "Turn off the music on speaker A", it is determined that speaker A is the final voice control response the second terminal device. At this time, the server 400 may send a response instruction to the speaker A, and send a silence instruction to other terminal devices 200 including the speaker B.
为了在能够实现业务需求信息的多个终端设备200中查找到主设备,如图14所示,在一些实施例中,主设备可以是距离语音指令对应声源位置最近的终端设备200。服务器400可以在查找主设备时,获取能够实现业务需求信息的多个智能设备针对语音指令检测的语音音频数据,并在语音音频数据中提取声能量值,再通过对比声能量值,以获得声能量值最高的终端设备200,从而将声能量值最高的终端设备200标记为主设备。In order to find the main device among multiple terminal devices 200 capable of realizing service requirement information, as shown in FIG. 14 , in some embodiments, the main device may be the terminal device 200 closest to the location of the sound source corresponding to the voice command. When the server 400 searches for the main device, it can obtain the voice and audio data of multiple smart devices that can realize the business demand information for voice command detection, and extract the sound energy value from the voice and audio data, and then compare the sound energy value to obtain the sound energy value. The terminal device 200 with the highest energy value, thus marking the terminal device 200 with the highest sound energy value as the master device.
由于特定场景内的混响时间参数T60是确定的,即在任何位置的能量衰减60db需要的时间是相同的,并且T60可基于对应位置的直达声和混响声能量比来估计,因此,可以基于波束形成的谱图和声源到达时间差,求出环境中所有终端设备200针对声源的直达声和混响声能量比,进而求出直达能量。再将各个设备中接收的声源直达声能量进行排列,即可以判断出距离声源位置最近的终端设备200,作为主设备。Since the reverberation time parameter T60 in a specific scene is determined, that is, the time required for the energy attenuation of 60db at any position is the same, and T60 can be estimated based on the energy ratio of the direct sound and the reverberation sound at the corresponding position, therefore, it can be based on From the beamformed spectrogram and the arrival time difference of the sound source, the energy ratio of the direct sound and the reverberant sound of all terminal devices 200 in the environment to the sound source is calculated, and then the direct energy is calculated. Then, by arranging the direct sound energy of the sound source received by each device, it can be determined that the terminal device 200 closest to the position of the sound source is the master device.
除上述基于声能量值确定主设备的方式外,还可以基于其他方式确定主设备。即在一些实施例中,对声源位置与终端设备200之间距离的检测过程还可以由每个终端设备200完成,即终端设备200可以通过多目摄像头对当前环境进行图像获取,并根据多个角度上的图像构建出三维空间模型,再根据图像识别方法,在三维空间模型中提取人像,从而定位用户在三维空间模型中的位置,即声源位置。定位声源位置后,终端设备200再根据当前智能家居模型的摆放状态,确定声源位置与每个终端设备200之间的距离,最后将计算的距离发送给服务器400,以使服务器400可以确定距离声源位置最近的终端设备200为主设 备。In addition to the foregoing method of determining the master device based on the sound energy value, the master device may also be determined based on other methods. That is, in some embodiments, the detection process of the distance between the sound source location and the terminal device 200 can also be completed by each terminal device 200, that is, the terminal device 200 can acquire images of the current environment through multi-eye cameras, and according to multiple A three-dimensional space model is constructed from images from different angles, and then a portrait is extracted from the three-dimensional space model according to the image recognition method, so as to locate the position of the user in the three-dimensional space model, that is, the position of the sound source. After locating the position of the sound source, the terminal device 200 determines the distance between the position of the sound source and each terminal device 200 according to the placement status of the current smart home model, and finally sends the calculated distance to the server 400, so that the server 400 can It is determined that the terminal device 200 closest to the sound source is the master device.
可见,在上述实施例中,当能够实现业务需求的终端设备200的数量包括多个时,服务器400可以通过主设备与用户进行进一步交互的方式,进一步在多个终端设备200中选中能够最终执行语音控制响应的第二终端设备,从而使语音控制过程之前,不会在多个设备间进行频繁通信,提高语音交互过程的响应速度。It can be seen that, in the above-mentioned embodiment, when there are multiple terminal devices 200 that can realize the service requirements, the server 400 can further select among multiple terminal devices 200 that can finally execute The second terminal device that responds to the voice control, so that before the voice control process, there will be no frequent communication between multiple devices, and the response speed of the voice interaction process will be improved.
基于上述实施例提供的语音控制方法,服务器400可以确定第二终端设备,并通过下发响应指令,使第二终端设备可以针对用户输入的语音进行交互响应。由于交互响应过程可以控制第二终端设备执行特定的交互动作,这些交互动作将可能更改终端设备200的设备状态,因此在将响应指令发送给第二终端设备后,服务器400还可以再获取第二终端设备在执行响应指令后的设备状态,以实时更新存储的设备状态。Based on the voice control method provided in the above embodiments, the server 400 can determine the second terminal device, and by issuing a response instruction, the second terminal device can make an interactive response to the voice input by the user. Since the interactive response process can control the second terminal device to perform specific interactive actions, these interactive actions may change the device status of the terminal device 200, so after sending the response command to the second terminal device, the server 400 can also obtain the second terminal device. The device state of the terminal device after executing the response command, so as to update the stored device state in real time.
即如图15所示,服务器400可以在向第二终端设备发送响应指令后,接收第二终端设备上报的执行结果数据。其中,所述执行结果数据中包括运行响应指令后的设备新状态。再从执行结果中提取设备新状态,并使用设备新状态更新存储模块中存储的设备状态。That is, as shown in FIG. 15 , the server 400 may receive the execution result data reported by the second terminal device after sending a response instruction to the second terminal device. Wherein, the execution result data includes the new state of the device after running the response instruction. Then extract the new state of the device from the execution result, and use the new state of the device to update the state of the device stored in the storage module.
通过上述实施例中提供的设备状态更新方式,可以使服务器400中存储的设备状态及时与语音控制系统中终端设备200的实际设备状态保持一致,从而使服务器400在后续执行智能语音交互过程中,可以基于更新后的设备状态对终端设备200进行筛选,更准确的确定第二终端设备。Through the device state update method provided in the above-mentioned embodiments, the device state stored in the server 400 can be kept consistent with the actual device state of the terminal device 200 in the voice control system in time, so that the server 400, in the subsequent execution of the intelligent voice interaction process, The terminal device 200 can be screened based on the updated device state, so as to more accurately determine the second terminal device.
基于上述语音控制方法,如图16所示,在本申请的部分实施例中还提供一种服务器400,包括:存储模块410、通信模块420以及控制模块430。其中,控制模块430被配置为执行以下程序步骤:Based on the above voice control method, as shown in FIG. 16 , in some embodiments of the present application, a server 400 is also provided, including: a storage module 410 , a communication module 420 and a control module 430 . Wherein, the control module 430 is configured to perform the following program steps:
S301:获取用户通过终端设备输入的语音指令。S301: Obtain a voice instruction input by the user through the terminal device.
S302:响应于语音指令,在语音指令中解析业务需求信息。S302: In response to the voice command, parse the service requirement information in the voice command.
S303:根据业务需求信息筛选第二终端设备,所述第二终端设备为设备状态能够实现业务需求信息的智能设备。S303: Screen the second terminal device according to the service requirement information, where the second terminal device is an intelligent device whose device status can realize the service requirement information.
S304:向第二终端设备发送响应指令。S304: Send a response instruction to the second terminal device.
S305:向当前语音控制系统中第二终端设备以外的其他智能设备发送静默指令。S305: Send a silent command to other smart devices other than the second terminal device in the current voice control system.
与上述服务器400向配合的,如图17所示,在本申请的部分实施例中还提供一种终端设备200,包括:音频输入装置、音频输出装置、通信器220以及控制器250。其中,控制器250被配置为执行以下程序步骤:Cooperating with the above server 400, as shown in FIG. 17 , a terminal device 200 is also provided in some embodiments of the present application, including: an audio input device, an audio output device, a communicator 220 and a controller 250 . Wherein, the controller 250 is configured to perform the following program steps:
S401:获取用户输入的用于执行语音控制的语音音频数据。S401: Acquire voice and audio data input by a user for performing voice control.
S402:根据语音音频数据生成语音指令。S402: Generate a voice instruction according to the voice audio data.
S403:向服务器发送语音指令,以使服务器在语音指令中解析业务需求信息,并根据业务需求信息筛选第二终端设备,所述第二终端设备为设备状态能够实现业务需求信息的智能设备。S403: Send a voice instruction to the server, so that the server parses the service requirement information in the voice instruction, and screens a second terminal device according to the service requirement information, and the second terminal device is an intelligent device whose device status can realize the service requirement information.
S404:接收服务器下发的响应指令或静默指令。S404: Receive a response instruction or a silent instruction sent by the server.
S405:运行响应指令或静默指令。S405: Run the response command or the silent command.
由以上内容可知,上述实施例提供的服务器400和终端设备200可以组成语音控制系统,用于实施上述语音控制方法。其中,服务器400可以在用户输入语音指令后,从语音指令中解析业务需求信息,并根据业务需求信息筛选当前设备状态能够实现该业务需求的第二终端设备,从而向第二终端设备发送响应指令,以使作为第二终端设备的智能设备做出语 音响应;同时,服务器400还根据第二终端设备的筛选结果,向当前语音控制系统中第二终端设备以外的其他设备发送静默指令,以使不作为第二终端设备的终端设备200不响应该语音控制功能。所述服务器400可以进行语音指令的预先处理,以使所有类型的终端设备200都能够在规定时间内快速高效地做出正确的唤醒应答,解决传统语音唤醒方法应答异常的问题。It can be seen from the above content that the server 400 and the terminal device 200 provided in the above embodiment can form a voice control system for implementing the above voice control method. Wherein, the server 400 may analyze the service requirement information from the voice command after the user inputs the voice command, and screen the second terminal device whose current device status can realize the service requirement according to the service requirement information, so as to send a response command to the second terminal device , so that the smart device as the second terminal device makes a voice response; at the same time, the server 400 also sends a silent instruction to other devices in the current voice control system other than the second terminal device according to the screening result of the second terminal device, so that the The terminal device 200 that is not the second terminal device does not respond to the voice control function. The server 400 can pre-process voice commands, so that all types of terminal devices 200 can quickly and efficiently make correct wake-up responses within a specified time, and solve the problem of abnormal responses in traditional voice wake-up methods.
随着为智能家居的广泛应用,通过语音命令控制相应终端设备,得到广大用户的青睐。图18为本申请实施例提供的一种终端设备的应用场景图。如图18中所示,家庭中的智能电视200-5、智能空调200-2、智能冰箱200-4以及智能洗衣机200-3等终端设备可通过物联网模块与智能终端300和服务器400进行连接,智能终端300与终端设备之间,可通过局域网或者广域网进行数据传输,实现对终端设备的控制管理。通常情况下,物联网模块可以内置于各个终端设备中。With the widespread application of smart homes, the corresponding terminal equipment can be controlled by voice commands, which is favored by the majority of users. FIG. 18 is an application scenario diagram of a terminal device provided by an embodiment of the present application. As shown in Figure 18, terminal devices such as smart TV 200-5, smart air conditioner 200-2, smart refrigerator 200-4, and smart washing machine 200-3 in the home can be connected to smart terminal 300 and server 400 through the Internet of Things module. , Between the intelligent terminal 300 and the terminal equipment, data transmission can be performed through a local area network or a wide area network, so as to realize the control and management of the terminal equipment. Typically, IoT modules can be built into individual end devices.
第一终端设备一般是具备一定功能的设备,对于接收到的语音,如果没有指明终端设备则认为是本设备响应该语音,如果指明了要操作的终端设备则发送控制指令给指明的终端设备,示例性的,以智能音箱为例,如果接收到“快进”后,首先判断智能音箱是否存在对应该指令的操作,如果存在则执行快进操作,如果不存在,则反馈表征未识别的提示信息。如果接收到“电视快进”,智能音箱首先判断对应已存在映射关系的电视,是否存在对应该指令的操作,如果存在则发送触发电视快进的操作指令给电视,如果不存在,则反馈表征未识别的提示信息。The first terminal device is generally a device with certain functions. For the received voice, if no terminal device is specified, it is considered that the device responds to the voice, and if the terminal device to be operated is specified, a control command is sent to the specified terminal device. Exemplarily, taking a smart speaker as an example, after receiving "fast forward", first judge whether there is an operation corresponding to the command on the smart speaker, and if so, execute the fast forward operation; if not, then feedback an unrecognized prompt information. If "TV fast forward" is received, the smart speaker first judges whether there is an operation corresponding to the corresponding TV corresponding to the mapping relationship. If it exists, it will send an operation command that triggers the TV fast forward to the TV. If it does not exist, it will feedback the representation Unrecognized hint message.
在一些实施例中,如果用户没有指明第二终端设备,第一终端设备可以播放表征存在映射的几个终端设备的提示给用户。In some embodiments, if the user does not specify the second terminal device, the first terminal device may play a prompt to the user representing several terminal devices that have mappings.
在一些实施例中,第一终端设备和其他终端设备的映射关系的判断,可以在本地模块中执行,也可以在云端执行。在一些实施例中,本地模块和可以安放在第一终端设备内或者相互固定。也可以是相互独立的客体。In some embodiments, the determination of the mapping relationship between the first terminal device and other terminal devices may be performed in a local module or in the cloud. In some embodiments, the local module and can be placed in the first terminal device or fixed to each other. They can also be separate objects.
在一些实施例中,以云端执行为例,用户在家庭中增设终端设备时,可对终端设备进行区域划分,将终端设备设定于固定区域中。例如,将智能电视、智能音箱、智能空调等设置于客厅区域,将智能冰箱设置于厨房区域。图19为本申请实施例提供的一种终端设备的另一应用场景图。当用户想要收听XXX歌曲时,在用户输入对第一终端设备输入唤醒词“嗨!XX”后,第一终端设备使设备中安装的语音应用处于激活状态,对第一终端设备输入“播放XXX歌曲”的语音消息,第一终端设备通过语音应用将语音消息转换为语音指令传输至服务器400,服务器400在接收到语音指令后查询当前该用户所配置的终端设备,在查询到有电视、冰箱和空调之后,会通过第一终端设备向用户反馈“您有3个设备,请问要用哪一个”,用户若是此时在客厅,可能会继续输入“用客厅的播放吧”的语音消息。服务器400会进一步判断此时客厅存在电视和空调两个终端设备,通过第一终端设备向用户反馈“您客厅有3个设备,请问要用哪一个”,用户则需要通过第一终端设备输入“用电视播放吧”,至此,服务器400会控制电视播放XXX歌曲。其中,第一终端设备可以为智能遥控器、智能音箱等具有收音功能的设备。In some embodiments, taking cloud execution as an example, when a user adds a terminal device to a home, the terminal device can be divided into regions and the terminal device can be set in a fixed region. For example, install smart TVs, smart speakers, smart air conditioners, etc. in the living room area, and install smart refrigerators in the kitchen area. FIG. 19 is another application scenario diagram of a terminal device provided in an embodiment of the present application. When the user wants to listen to XXX songs, after the user inputs the wake-up word "Hi! XX" to the first terminal device, the first terminal device activates the voice application installed in the device, and inputs "play" to the first terminal device XXX song "voice message, the first terminal device converts the voice message into a voice command through the voice application and transmits it to the server 400, and the server 400 inquires about the terminal device currently configured by the user after receiving the voice command, and finds that there is a TV, After the refrigerator and air conditioner, the user will be fed back "You have 3 devices, which one do you want to use" through the first terminal device. If the user is in the living room at this time, he may continue to input the voice message of "play with the living room". The server 400 will further determine that there are two terminal devices, a TV and an air conditioner, in the living room at this time, and feed back to the user through the first terminal device "You have 3 devices in the living room, which one do you want to use?" The user needs to input " Let's play it on TV", so far, the server 400 will control the TV to play XXX song. Wherein, the first terminal device may be a device with a radio function such as a smart remote controller and a smart speaker.
基于上述一个家庭中存在多个终端设备的情况,服务器400需要与用户进行多轮语音交互,让用户主动的做各种提示,最终确定出最终要执行指令的第二终端设备。上述与用户进行多次交互的过程。Based on the fact that there are multiple terminal devices in one family, the server 400 needs to perform multiple rounds of voice interaction with the user, so that the user can actively make various prompts, and finally determine the second terminal device that will finally execute the command. The above process of multiple interactions with the user.
在一些实施例中,为了提高用户体验,本申请在一些实施例中提供了一种服务器,该 服务器被配置为执行语音交互过程。下面结合附图对语音交互过程进行说明。In some embodiments, in order to improve user experience, the present application provides a server in some embodiments, and the server is configured to perform a voice interaction process. The voice interaction process will be described below with reference to the accompanying drawings.
在一些实施例中,服务器400中在执行的方法可以在一个其他终端设备或者第一终端设备中执行。后续以服务器执行为例。In some embodiments, the method being executed in the server 400 may be executed in another terminal device or the first terminal device. Follow-up uses server execution as an example.
图20为本申请实施例提供的一种语音控制方法的流程示意图。如图20所示,所述方法包括以下步骤:FIG. 20 is a schematic flowchart of a voice control method provided by an embodiment of the present application. As shown in Figure 20, the method includes the following steps:
用户实际环境中安置有第一终端设备,以及电视,冰箱,空调等其他终端设备,第一终端设备和所述其他终端设备均可以通过网络连接服务器,用户通过预先在某一终端设备端的操作,使得这些终端设备均登陆相同的账号,进而使得服务器端存储有账号的用户标识和终端设备的映射关系。在任一时刻,第一终端设备接收用户输入的语音。The first terminal device and other terminal devices such as TVs, refrigerators, and air conditioners are placed in the user's actual environment. Both the first terminal device and the other terminal devices can be connected to the server through the network, and the user pre-operates on a certain terminal device. These terminal devices are all logged into the same account, and the server side stores the mapping relationship between the user ID of the account and the terminal device. At any moment, the first terminal device receives the voice input by the user.
在一些实施例中,用户基于自身需求,可向第一终端设备发送语音指令,通过语音控制相应的第二终端设备执行语音指令。例如,在用户想要使用第二终端设备观看XXX影片时,可通过向第一终端设备输入唤醒词“嗨!XX”后,对第一终端设备输入“播放XXX影片”的语音消息。需要说明的是,第二终端设备和第一终端设备是两个独立的设备,第二终端设备可以拥有自己的语音接收装置进行语音控制工作,第二终端设备和第一终端设备不是电视和仅为电视使用的电视遥控器之间的关系,也不是电视处理器和电视上的语音接收装置之间的关系。In some embodiments, the user may send a voice command to the first terminal device based on his own needs, and control the corresponding second terminal device to execute the voice command by voice. For example, when the user wants to use the second terminal device to watch the XXX movie, he can enter the wake-up word "Hi! XX" into the first terminal device, and then input a voice message of "Play XXX movie" to the first terminal device. It should be noted that the second terminal device and the first terminal device are two independent devices, and the second terminal device can have its own voice receiving device for voice control. The second terminal device and the first terminal device are not TVs and only It is not the relationship between the TV remote control used for the TV, nor the relationship between the TV processor and the voice receiving device on the TV.
在一些实施例中,唤醒词和语音指令可以连续的进行输入。第一终端设备可以识别包含唤醒词的语句中所包含的语音指令。In some embodiments, the wake word and voice command can be entered sequentially. The first terminal device can recognize the voice instruction contained in the sentence containing the wake-up word.
在一些实施例中,第一终端设备可集成收音和语音解析功能。其中收音功能是接收用户发出的语音消息,语音解析功能指的是提取用户发出的语音消息中的关键部分,该关键部分能够体现出用户的意图或者说要做的内容,在解析出用户的意图之后,将其转换成语音指令,该语音指令可以为第一终端设备与服务器400之间可执行约定的指令格式。其中,指令格式可以包括command(指令)和parameters(参数)。在转换成语音指令后,将语音指令发送给服务器400,由服务器400进行接收。其中,将相应的用户标识添加至语音指令中,便于服务器400识别是哪个用户所发送的语音指令,也便于用户查找该用户所配置的全部终端设备。In some embodiments, the first terminal device can integrate functions of sound collection and speech analysis. Among them, the radio function is to receive the voice message sent by the user, and the voice analysis function refers to extracting the key part of the voice message sent by the user. The key part can reflect the user's intention or the content to be done, and analyze the user's intention Afterwards, it is converted into a voice command, and the voice command may be an executable command format agreed between the first terminal device and the server 400 . Wherein, the command format may include command (command) and parameters (parameter). After being converted into voice commands, the voice commands are sent to the server 400 and received by the server 400 . Wherein, adding the corresponding user identifier to the voice command facilitates the server 400 to identify which user sent the voice command, and also facilitates the user to search for all terminal devices configured by the user.
在一些实施例中,第一终端设备具有独立的声像显示系统,可以进行音视频流的解码和播放。In some embodiments, the first terminal device has an independent audio-visual display system capable of decoding and playing audio and video streams.
在一些实施例中,第二终端设备同样具有独立的声像显示系统,可以进行音视频流的解码和播放。In some embodiments, the second terminal device also has an independent audio and video display system, which can decode and play audio and video streams.
S2001:接收第一终端设备发送的包含用户标识的语音指令。S2001: Receive a voice instruction including a user identifier sent by a first terminal device.
在一些实施例中,第一终端设备可以将根据语音转化成的文本作为语音指令并发送给服务器。In some embodiments, the first terminal device may use the text converted from the voice as a voice instruction and send it to the server.
在一些实施例中,语音转化的服务端在所述服务器中,因此第一终端设备可以将接收语音按约定的封装结构发送给服务器即可,服务器解封接收到的数据并获得语音指令。In some embodiments, the voice conversion server is in the server, so the first terminal device can send the received voice to the server according to the agreed encapsulation structure, and the server decapsulates the received data and obtains the voice command.
S2002:查找与所述用户标识相关的所有终端设备。S2002: Find all terminal devices related to the user identifier.
在一些实施例中,服务器400接收到语音指令之后,对语音指令进行解析,获取到用户标识和该语音指令所传递的用户的意图。In some embodiments, after receiving the voice command, the server 400 analyzes the voice command to obtain the user identifier and the user's intention conveyed by the voice command.
在一些实施例中,服务器通过对语音指令的解析,根据解析出的文本中的关键字,确定指令对应的设备类型,这里,设备类型指的是终端设备所具备的功能权限。In some embodiments, the server determines the device type corresponding to the command by analyzing the voice command and according to the keywords in the parsed text, where the device type refers to the functional authority of the terminal device.
在一些实施例中,服务器中可以缓存关键字和设备类型的映射关系,在一些实施例中,也可以存放已经训练好的关键词-设备类型的神经网络模型。In some embodiments, the mapping relationship between keywords and device types may be cached in the server, and in some embodiments, a trained keyword-device type neural network model may also be stored.
以服务器中缓存关键字对应的设备类型为例,如表1:Take the device type corresponding to the cache keyword in the server as an example, as shown in Table 1:
语音指令voice command 设备类型Equipment type 终端设备Terminal Equipment
播放电影play movie 视频播放video playback 智能电视、智能冰箱屏Smart TV, smart refrigerator screen
播放歌曲play song 音频播放Audio Player 智能音箱smart speaker
开始送风start blowing 室温调整room temperature adjustment 智能空调、智能电风扇Intelligent air conditioner, intelligent electric fan
表1Table 1
同时,服务器400在获取到用户标识之后,从数据库中查找与用户标识相关的所有终端设备,也就是查找该用户所配置的全部终端设备。这是因为,如果基于近场通信进行设备的发现,设备可以发现其扫描到的所有设备,包括自己的和/或地理位置临近的其他用户的设备,但这些设备中有些是不属于该用户的,即使通过局域网判断,也可能初选局域网内接入了访客设备的场景。通过在一个具备管理功能的服务器中,预先建立用户标识和自己的设备的关联关系,服务器400就可以根据用户标识确定用户实际拥有的设备。At the same time, after obtaining the user ID, the server 400 searches the database for all terminal devices related to the user ID, that is, searches for all terminal devices configured by the user. This is because, if device discovery is based on near-field communication, the device can discover all the devices it scans, including its own and/or devices of other users in close proximity, but some of these devices do not belong to the user , even if it is judged by the LAN, it is possible to initially select the scenario where a guest device is connected to the LAN. By pre-establishing a relationship between the user ID and its own equipment in a server with a management function, the server 400 can determine the equipment actually owned by the user according to the user ID.
在一些实施例中,所述终端设备是指与所述用户标识关联的除所述第一终端设备外的其他设备。In some embodiments, the terminal device refers to other devices associated with the user identifier except the first terminal device.
在一些实施例中,语音指令的识别需要判断语音指令的类型,然后和各终端设备所能执行的类型相比对,进而确定可以执行该语音指令的终端设备。In some embodiments, the recognition of the voice command needs to determine the type of the voice command, and then compare it with the types that can be executed by each terminal device, and then determine the terminal device that can execute the voice command.
在一些实施例中,各终端设备所能执行的类型可以是预先标定好的,例如,厨房冰箱对应冷冻,冷藏,食谱推荐,食材识别。也可以是在扫描到新增设备时,根据设备标识确定的。例如,在新增的设备标识表征该设备是冰箱时,建立该设备标识与冷冻,冷藏,食谱推荐,食材识别等类型的关联。In some embodiments, the execution types of each terminal device may be pre-calibrated, for example, the kitchen refrigerator corresponds to freezing, refrigeration, recipe recommendation, and ingredient identification. It may also be determined according to the device identifier when a newly added device is scanned. For example, when the newly added device identifier indicates that the device is a refrigerator, an association between the device identifier and freezing, refrigeration, recipe recommendation, food material identification, etc. is established.
在一些实施例中,服务器400在获取到用户标识之后,还可以将与用户标识相关的所有终端设备及设备属性从数据库加载至缓存中。后续服务器400在查找第二终端设备时,可直接到缓存中查找,从而能够加快服务器400查找第二终端设备的速度。其中设备属性包括终端设备的位置、终端设备的名字、终端设备ID等固有属性。In some embodiments, after acquiring the user ID, the server 400 may also load all terminal devices and device attributes related to the user ID from the database into the cache. Subsequently, when the server 400 searches for the second terminal device, it may directly search in the cache, so that the speed at which the server 400 searches for the second terminal device can be accelerated. The device attributes include inherent attributes such as the location of the terminal device, the name of the terminal device, and the ID of the terminal device.
S2003:在不存在与所述用户标识相关的终端设备时,反馈表征不存在终端设备的参数,以使所述第一终端设备播报不存在执行语音指令的终端设备。S2003: When there is no terminal device related to the user identifier, feed back a parameter representing the absence of the terminal device, so that the first terminal device broadcasts that there is no terminal device executing the voice command.
在一些实施例中,服务器400若是没有查找到与用户标识相关的终端设备,说明该用户没有配置终端设备,此时,服务器400需要向第一终端设备反馈表征不存在终端设备的参数。第一终端设备根据接收到的表征不存在终端设备的参数播报不存在终端设备。In some embodiments, if the server 400 does not find a terminal device related to the user identifier, it means that the user has not configured a terminal device. At this time, the server 400 needs to feed back the parameter indicating that there is no terminal device to the first terminal device. The first terminal device broadcasts the absence of the terminal device according to the received parameter representing the absence of the terminal device.
S2004:在存在与所述用户标识相关的终端设备时,利用预设的过滤规则筛选出最匹配的第二终端设备,反馈表征最匹配的第二终端设备的参数,以使所述第一终端设备播报存在执行语音指令的第二终端设备,并控制最匹配的第二终端设备执行所述语音指令。S2004: When there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching second terminal device, and feed back parameters characterizing the best matching second terminal device, so that the first terminal The device announces that there is a second terminal device that executes the voice command, and controls the most matching second terminal device to execute the voice command.
在一些实施例中,预设的过滤规则是指用于筛选出符合用户意图的预置规则/可以执行该语音指令的设备的规则。例如,用户在输入“播放民乐”的语音时,第一终端设备将播放民乐的指令发送给服务器,服务器识别出用户的意图的类型是播放音乐,则根据该类型确定可以执行该类型操作的第二终端设备。In some embodiments, the preset filtering rule refers to a rule for filtering out a preset rule conforming to the user's intention/a device capable of executing the voice instruction. For example, when the user inputs the voice of "play folk music", the first terminal device sends an instruction to play folk music to the server, and the server recognizes that the type of the user's intention is to play music, and then determines the first terminal that can perform this type of operation according to the type. Two terminal equipment.
在一些实施例中,在存在与所述用户标识相关的终端设备时,利用与所述用户标识相 关的终端设备来执行语音指令。In some embodiments, when there is a terminal device related to the user identifier, the terminal device related to the user identifier is used to execute the voice instruction.
在一些实施例中,在存在与所述用户标识相关的终端设备时,还需要确定终端设备的个数和/或终端设备所能执行的功能,这里,终端设备所能执行的功能即为终端设备功能权限。示例性的,在语音指令为播放视频的指令时,需要确定出可以进行视频播放的设备,不能执行视频播放的设备会筛除,在语音指令为制冷26°的指令时,不能进行制冷的设备被筛除,使用可以执行该制冷设备执行该语音指令。In some embodiments, when there is a terminal device related to the user identifier, it is also necessary to determine the number of terminal devices and/or the functions that the terminal device can perform. Here, the function that the terminal device can perform is the terminal Device function permissions. Exemplarily, when the voice instruction is an instruction to play a video, it is necessary to determine the device that can play the video, and the device that cannot perform video playback will be screened out. is screened out, the voice command can be executed by the cooling device.
在一些实施例中,服务器400中配置有预设的过滤规则。所述过滤规则表征所述语音指令与所述终端设备之间的映射关系。所述过滤规则包含第一组规则和第二组规则,或者只包含第一组规则不包含第二组规则,其中,所述第一组规则指的是为了筛选出最匹配的第二终端设备所必需的规则,所述第二组规则指的是在没有筛选出最匹配的第二终端设备时逐一叠加利用的规则。In some embodiments, the server 400 is configured with preset filtering rules. The filtering rule characterizes the mapping relationship between the voice instruction and the terminal device. The filtering rules include the first group of rules and the second group of rules, or only the first group of rules do not include the second group of rules, wherein the first group of rules refers to filtering out the most matching second terminal device Necessary rules, the second set of rules refer to the rules that are superimposed and utilized one by one when no best matching second terminal device is screened out.
在一些实施例中,第一组规则包含设备类型和各个终端设备的映射关系,例如上述实施例中的表1所示的映射关系。In some embodiments, the first group of rules includes the mapping relationship between device types and each terminal device, for example, the mapping relationship shown in Table 1 in the above embodiments.
在一些实施例中,第一组规则可以是对用户标识对应的个数的判断。In some embodiments, the first set of rules may be a judgment on the number corresponding to user identifiers.
在一些实施例中,根据第一组规则确定出第二终端设备后,如果还是包含两个或以上的终端设备,则需要进一步利用第二规则进行进一步筛选。示例性的,在根据设备类型和各个设备的映射关系确定出可选的第二终端设备后,如果第二终端设备数量为1个,则直接将语音指令发送给该第二终端设备,如果是0个,则反馈无可执行的设备给第一终端设备,如果大于1个,则可以根据位置/安装时间/使用频率/上次执行该类型指令的设备/启动时间/信号强度/优先级等等中的一个或任一组合来作为第二组规则进行过滤。In some embodiments, after the second terminal device is determined according to the first set of rules, if there are still two or more terminal devices, the second rule needs to be further used for further screening. Exemplarily, after the optional second terminal device is determined according to the device type and the mapping relationship of each device, if the number of the second terminal device is 1, the voice instruction is directly sent to the second terminal device, if 0, then there is no executable device to feed back to the first terminal device. If it is more than 1, it can be based on location/installation time/usage frequency/device that executed this type of command last time/start time/signal strength/priority, etc. One or any combination of them can be used as the second set of rules for filtering.
图21为本申请实施例提供的一种筛选终端设备的流程示意图。结合图21,本申请在一些实施例中根据过滤规则筛选第二终端设备的过程如下:FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application. With reference to FIG. 21 , in some embodiments of the present application, the process of screening the second terminal device according to the filtering rules is as follows:
S2101:利用所述第一组规则筛选当前终端设备与所述用户标识相关的终端设备。S2101: Use the first set of rules to screen the current terminal device and the terminal device related to the user identifier.
在一些实施例中,在所述在存在与所述用户标识相关的终端设备时,服务器400首先通过为了筛选出最匹配的第二终端设备所必需的规则来对终端设备进行筛选。在一些实施例中,所述第一组规则包含终端设备功能权限子规则,这里,终端设备功能权限子规则指的是各个终端设备所分别具备的功能,如智能电视具有播放媒资的功能,空调具有制冷制热的功能等等。也就是当服务器400筛选出用户实际拥有的终端设备时,再进一步利用终端设备功能权限子规则筛选终端设备,排除掉不具有相应权限功能的终端设备。In some embodiments, when there is a terminal device related to the user identifier, the server 400 first screens the terminal device according to the necessary rules for screening out the most matching second terminal device. In some embodiments, the first group of rules includes terminal device function authority sub-rules, where the terminal device function authority sub-rules refer to the respective functions of each terminal device, such as a smart TV having the function of playing media assets, The air conditioner has the function of cooling and heating and so on. That is, when the server 400 screens out the terminal devices actually owned by the user, it further uses the terminal device function authority sub-rule to screen the terminal devices, and excludes the terminal devices that do not have corresponding authority functions.
在一些实施例中,在利用所述终端设备功能权限子规则筛选终端设备时,服务器400分别检测多个与所述用户标识相关的终端设备的功能权限。在所述功能权限适合执行所述语音指令时,服务器400将相对应的终端设备选定为所述备选终端设备。在所述功能权限不适合执行所述语音指令时,服务器400排除相对应的终端设备。In some embodiments, when terminal devices are screened by using the sub-rules of terminal device function rights, the server 400 respectively detects function rights of multiple terminal devices related to the user identifier. When the function authority is suitable for executing the voice instruction, the server 400 selects the corresponding terminal device as the candidate terminal device. When the function authority is not suitable for executing the voice instruction, the server 400 excludes the corresponding terminal device.
S2102:若与所述用户标识相关的终端设备不具备执行所述语音指令权限时,则反馈表征不存在终端设备的参数,以使所述第一终端设备播报不存在执行语音指令的终端设备。S2102: If the terminal device related to the user identifier does not have the authority to execute the voice command, feed back a parameter indicating that there is no terminal device, so that the first terminal device broadcasts that there is no terminal device that executes the voice command.
S2103:若与所述用户标识相关的终端设备具备执行所述语音指令权限时,则确认具备执行所述语音指令权限的第二终端设备的数量。S2103: If the terminal device related to the user identifier has the right to execute the voice command, confirm the number of second terminal devices that have the right to execute the voice command.
在一些实施例中,在利用第一组规则过滤与所述用户标识相关的终端设备时,若是用户实际拥有的终端设备都不具备相应的功能权限,则服务器400需要向第一终端设备反馈表征不存在终端设备的参数。第一终端设备根据接收到的表征不存在终端设备的参数播报 不存在终端设备。In some embodiments, when using the first set of rules to filter terminal devices related to the user identifier, if none of the terminal devices actually owned by the user has the corresponding functional authority, the server 400 needs to feed back the token to the first terminal device. There are no parameters for end devices. The first terminal device broadcasts that there is no terminal device according to the received parameter representing the absence of the terminal device.
在一些实施例中,在利用第一组规则过滤与所述用户标识相关的终端设备时,若是用户实际拥有的终端设备具备相应的功能权限,则服务器400还需要确认具备相应功能权限的终端设备的数量。In some embodiments, when using the first set of rules to filter terminal devices related to the user identifier, if the terminal device actually owned by the user has the corresponding functional authority, the server 400 also needs to confirm that the terminal device with the corresponding functional authority quantity.
在一些实施例中,服务器400检测到仅在存在一个具备执行所述语音指令权限的终端设备时,则向第一终端设备反馈表征存在相应权限的终端设备的参数,所述第一终端设备根据表征存在相应权限的终端设备的参数播报具备相应权限的终端设备。In some embodiments, when the server 400 detects that there is only one terminal device with the authority to execute the voice command, it feeds back the parameters representing the terminal device with the corresponding authority to the first terminal device, and the first terminal device according to The parameter representing the terminal device with the corresponding authority broadcasts the terminal device with the corresponding authority.
服务器400还需按照语音指令中用户的意图控制该当前终端设备执行相应操作。The server 400 also needs to control the current terminal device to perform corresponding operations according to the user's intention in the voice command.
在一些实施例中,服务器400将所述语音指令发送给当前终端设备,当前终端设备接收所述语音指令并根据所述语音指令执行响应的操作。In some embodiments, the server 400 sends the voice instruction to the current terminal device, and the current terminal device receives the voice instruction and performs a corresponding operation according to the voice instruction.
例如,用户当前仅配置了扫地机器人,若是用户输入“扫地”的语音消息,第一终端设备将该语音消息传输至服务器400,服务器400在查询后发现当前用户仅配置有扫地机器人,正好扫地机器人具备清扫功能。故服务器400控制第一终端设备播报“扫地机器人开始扫地”,并控制扫地机器人执行扫地功能。其中,服务器400控制第一终端设备播报时,可通过长链接的方式将表征存在相应权限的终端设备的参数反馈至第一终端设备。For example, the user is currently only equipped with a sweeping robot. If the user inputs a voice message of "sweeping the floor", the first terminal device transmits the voice message to the server 400. After querying, the server 400 finds that the current user is only equipped with a sweeping robot. With cleaning function. Therefore, the server 400 controls the first terminal device to broadcast "the sweeping robot starts to sweep the floor", and controls the sweeping robot to perform the sweeping function. Wherein, when the server 400 controls the broadcasting of the first terminal device, it may feed back the parameter representing the terminal device with the corresponding authority to the first terminal device through a long link.
在一些实施例中,在存在多个具备执行所述语音指令权限的终端设备时,则按照优先级由高到低的顺序逐一利用所述第二组规则中的子规则筛选多个具备执行所述语音指令权限的第二终端设备,直至筛选出一个最匹配的第二终端设备。In some embodiments, when there are multiple terminal devices that have the authority to execute the voice command, the sub-rules in the second set of rules are used to screen the multiple terminal devices that have the authority to execute the voice command one by one in order of priority from high to low. The second terminal device with the above-mentioned voice command authority, until a best-matching second terminal device is screened out.
在一些实施例中,服务器400中预设的第二组规则包括用户使用频次子规则、与第一终端设备的距离子规则以及第二终端设备优先级子规则。这里,用户使用频次子规则指的是用户通过第二终端设备所执行过相似语音指令的次数,例如,用户通过智能电视播放A电影、B综艺等视频数据的次数,或者用户通过智能音箱播放C歌曲、D歌曲等音频数据的次数,其中,A电影、B综艺、C歌曲、D歌曲等均可划分为媒资数据,相应的,用户所发出的“播放A电影”、“播放B综艺”、“播放C歌曲”及“播放D歌曲”可视为相似的语音指令,均为播放类语音指令,还可以进一步将“播放A电影”和“播放B综艺”划分为视频播放类语音指令,将“播放C歌曲”及“播放D歌曲”划分为音频播放类语音指令。与第一终端设备的距离子规则指的是各个第二终端设备分别与第一终端设备之间的距离。第二终端设备优先级子规则指的是用户设置的在执行语音指令时各个终端设备的优先级,例如,对应播放媒资的语音指令,智能电视的优先级高于智能冰箱屏的优先级,其中,终端设备的优先级可由用户通过智能终端300上的应用程序进行设定。In some embodiments, the second set of rules preset in the server 400 includes a user frequency sub-rule, a distance from the first terminal device sub-rule, and a second terminal device priority sub-rule. Here, the user frequency sub-rule refers to the number of times the user has executed similar voice commands through the second terminal device, for example, the number of times the user plays video data such as A movie and B variety show through a smart TV, or the user plays C through a smart speaker. The frequency of audio data such as songs and D songs. Among them, A movie, B variety show, C song, D song, etc. can be divided into media data. , "play C song" and "play D song" can be regarded as similar voice commands, all of which are playback voice commands, and "play A movie" and "play B variety show" can be further divided into video playback voice commands, "Play song C" and "play song D" are divided into audio playback voice instructions. The distance sub-rule from the first terminal device refers to the distance between each second terminal device and the first terminal device. The second terminal device priority sub-rule refers to the priority of each terminal device set by the user when executing voice commands. For example, corresponding to the voice commands for playing media assets, the priority of smart TVs is higher than that of smart refrigerator screens. Wherein, the priority of the terminal device can be set by the user through the application program on the smart terminal 300 .
在一些实施例中,上述第二组规则中的多个子规则可由用户设置相应的优先级,例如,设置用户使用频次子规则优先级的优先级高于与第一终端设备的距离子规则优先级,设置终端设备优先级子规则的优先级高于用户使用频次子规则的优先级等。In some embodiments, multiple sub-rules in the above-mentioned second group of rules can be set with corresponding priorities by the user, for example, the priority of setting the priority of the user usage frequency sub-rule is higher than the priority of the distance from the first terminal device sub-rule , set the priority of the terminal device priority sub-rule to be higher than that of the user usage frequency sub-rule, etc.
在一些实施例中,服务器400利用过滤规则对终端设备进行过滤的目的就是筛选出其中一个用于执行当前语音指令最为合适的第二终端设备,也就是说,筛选到最后,最匹配的第二终端设备的数量为1个。在利用第一组规则筛选之后,只要相应的终端设备的数量不唯一,服务器400就会按照优先级高低逐一利用第二组规则中的子规则对剩余的终端设备进行筛选。In some embodiments, the server 400 uses filtering rules to filter terminal devices to select one of the most suitable second terminal devices for executing the current voice command, that is, to filter to the end, the most matching second terminal device The number of terminal devices is 1. After screening by the first set of rules, as long as the number of corresponding terminal devices is not unique, the server 400 will use the sub-rules in the second set of rules to screen the remaining terminal devices one by one according to the priority.
例如,在用户输入“播放A歌曲”之后,服务器400查询到该用户配置有智能电视、智能音箱、智能空调、智能冰箱以及智能洗衣机,服务器400在利用终端设备功能权限子规 则对各个终端设备进行过滤之后,得到其中智能电视、智能音箱及智能冰箱具有播放功能,则需要通过第二组规则进一步筛选智能电视、智能音箱及智能冰箱。服务器400进一步通过与第一终端设备的距离子规则进行过滤,通过终端设备中设备属性中的设备位置信息,服务器400判断出智能电视、智能音箱与第一终端设备均在客厅,也就是距离较近,而智能冰箱在厨房,与第一终端设备的距离较远,基于此,将智能冰箱排除掉。由于备选终端设备的还剩下2个,即智能电视、智能音箱,还需要进一步筛选,服务器400继续通过终端设备优先级子规则进行过滤,对于播放类语音指令,用户设置的智能电视的优先级高于智能音箱的优先级,故服务器400将智能电视选定为最匹配的终端设备。For example, after the user inputs "play song A", the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart washing machine. After filtering, it is obtained that the smart TV, smart speaker, and smart refrigerator have playback functions, and the smart TV, smart speaker, and smart refrigerator need to be further screened through the second set of rules. The server 400 further filters through the distance sub-rule with the first terminal device, and through the device location information in the device attributes of the terminal device, the server 400 judges that the smart TV, the smart speaker and the first terminal device are all in the living room, that is, the distance is relatively small. The distance between the smart refrigerator and the first terminal device is relatively long, while the smart refrigerator is in the kitchen. Based on this, the smart refrigerator is excluded. Since there are still two candidate terminal devices, i.e. smart TVs and smart speakers, which need to be further screened, the server 400 continues to filter through the terminal device priority sub-rules. priority is higher than that of the smart speaker, so the server 400 selects the smart TV as the most matching terminal device.
又如,在用户输入“降低室内温度”之后,服务器400查询到该用户配置有智能电视、智能音箱、智能空调、智能冰箱以及智能电风扇,服务器400在利用终端设备功能权限子规则对各个终端设备进行过滤之后,得到其中智能空调和智能电风扇具有室温调整功能,则服务器400需要通过第二组规则进一步筛选智能空调和智能电风扇。服务器400继续通过终端设备优先级子规则进行过滤,对于室温调整类语音指令,用户设置的智能空调的优先级高于智能电风扇的优先级,故服务器400将智能空调选定为最匹配的终端设备。当然,用户也可根据自己的需求,设置智能电风扇的优先级高于智能空调的优先级。For another example, after the user inputs "lower indoor temperature", the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart electric fan. After filtering by the device, it is obtained that the smart air conditioner and the smart electric fan have room temperature adjustment functions, and the server 400 needs to further screen the smart air conditioner and the smart electric fan through the second set of rules. The server 400 continues to filter through the terminal device priority sub-rules. For voice commands such as room temperature adjustment, the priority of the smart air conditioner set by the user is higher than that of the smart electric fan, so the server 400 selects the smart air conditioner as the most matching terminal equipment. Of course, users can also set the priority of smart electric fans higher than that of smart air conditioners according to their own needs.
在一些实施例中,在利用所述用户使用频次子规则筛选终端设备时,服务器400分别检测多个与所述用户标识相关的第二终端设备的执行频次,其中,所述执行频次指的是第二终端设备在历史行为中所执行过相似语音指令的次数。服务器400将保留执行频次最多的第二终端设备。In some embodiments, when using the user frequency sub-rules to screen terminal devices, the server 400 respectively detects execution frequencies of a plurality of second terminal devices related to the user identifier, wherein the execution frequency refers to The number of times that the second terminal device has executed similar voice commands in historical behaviors. The server 400 will reserve the second terminal device with the highest execution frequency.
在一些实施例中,在利用所述与第一终端设备的距离子规则筛选终端设备时,服务器400分别检测多个与所述用户标识相关的第二终端设备与第一终端设备的距离。服务器400将保留与第一终端设备的距离最近的第二终端设备。In some embodiments, when using the sub-rule of distance from the first terminal device to screen terminal devices, the server 400 respectively detects the distances between multiple second terminal devices related to the user identifier and the first terminal device. The server 400 will reserve the second terminal device with the closest distance to the first terminal device.
在一些实施例中,在利用所述终端设备优先级筛选终端设备时,服务器400按照用户设置的终端设备优先级,保留优先级最高的第二终端设备。In some embodiments, when screening terminal devices using the terminal device priorities, the server 400 reserves the second terminal device with the highest priority according to the terminal device priorities set by the user.
以上实施例不仅适用于家庭场景,还可以是办公等场景。The above embodiments are not only applicable to home scenarios, but also to office scenarios.
本申请中,在服务器响应于用户发出的语音指令,并查询到当前有多个终端设备时,可基于过滤规则选择出最终要执行指令的第二终端设备,避免用户与第一终端设备进行多次语音交互过程,提升用户体验。In this application, when the server responds to the voice command issued by the user and inquires that there are currently multiple terminal devices, it can select the second terminal device that will finally execute the command based on the filtering rules, so as to avoid multiple communication between the user and the first terminal device. Voice interaction process to improve user experience.
本申请在一些实施例中还提供了一种语音控制方法,所述方法包括:服务器400接收第一终端设备发送的包含用户标识的语音指令,并查找与所述用户标识相关的所有终端设备。在不存在与所述用户标识相关的终端设备时,服务器400反馈表征不存在终端设备的参数,以使所述第一终端设备播报不存在执行语音指令的终端设备。在存在与所述用户标识相关的终端设备时,服务器400利用预设的过滤规则筛选出最匹配的终端设备,反馈表征最匹配的终端设备的参数,以使所述第一终端设备播报具备最匹配的终端设备,并控制最匹配的终端设备执行所述语音指令。The present application also provides a voice control method in some embodiments, the method includes: the server 400 receives a voice instruction including a user ID sent by a first terminal device, and searches for all terminal devices related to the user ID. When there is no terminal device related to the user identifier, the server 400 feeds back a parameter characterizing the absence of a terminal device, so that the first terminal device broadcasts that there is no terminal device executing the voice command. When there is a terminal device related to the user identifier, the server 400 uses preset filtering rules to filter out the most matching terminal device, and feeds back parameters representing the most matching terminal device, so that the first terminal device broadcasts the most matching terminal equipment, and control the most matching terminal equipment to execute the voice instruction.
图22为本申请实施例提供的一种语音控制过程的场景示意图。如图22所示,假设智能家居场景中的终端设备包括终端设备200-4(即智能冰箱)、终端设备200-3(即智能洗衣机)和终端设备200-5(即智能显示设备),用户想要对智能家居场景中的终端设备进行控制时,先通过第一终端设备,即智能终端200-1中的录音应用进行录音,得到录音数据,其中,该录音数据主要为用户的控制意图,且该控制意图中不包括要控制的第二终端 设备。该智能终端200-1把用户的录音数据发送至服务器400,以使服务器400对语音指令进行识别,得到与语音指令相对应的具体控制信息,从而根据该控制信息确定用户实际想要控制的第二终端设备,并直接控制该第二终端设备执行对应的控制指令,即:通过服务器400和智能终端200-1二者的交互实现终端设备的语音控制;或者用户可通过本地服务器400,如物联网终端中的录音模块录入声音,其中,该声音主要为用户的控制意图,且该控制意图中不包括要控制的第二终端设备。本地服务器400,对用户录入的语音指令进行识别,得到与语音指令相对应的具体控制信息,从而根据该控制信息确定用户实际想要控制的第二终端设备,并直接控制该第二终端设备执行对应的控制指令。FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application. As shown in FIG. 22, it is assumed that the terminal devices in the smart home scenario include terminal device 200-4 (that is, a smart refrigerator), terminal device 200-3 (that is, a smart washing machine), and terminal device 200-5 (that is, a smart display device). When you want to control the terminal equipment in the smart home scene, you first record through the recording application in the first terminal equipment, that is, the intelligent terminal 200-1, to obtain recording data, wherein the recording data is mainly the user's control intention, And the control intention does not include the second terminal device to be controlled. The intelligent terminal 200-1 sends the recorded data of the user to the server 400, so that the server 400 can recognize the voice command and obtain the specific control information corresponding to the voice command, so as to determine the first control information that the user actually wants to control according to the control information. Two terminal devices, and directly control the second terminal device to execute the corresponding control instructions, that is: through the interaction between the server 400 and the intelligent terminal 200-1, the voice control of the terminal device is realized; or the user can use the local server 400, such as the object The recording module in the networked terminal records sound, wherein the sound is mainly the user's control intention, and the control intention does not include the second terminal device to be controlled. The local server 400 recognizes the voice command entered by the user to obtain specific control information corresponding to the voice command, thereby determining the second terminal device that the user actually wants to control according to the control information, and directly controls the second terminal device to execute corresponding control commands.
上述过程中,用户只需要进行录音即可,其余的处理过程不需要用户的参与,从而能够实现通过语音自动控制终端设备执行相应的控制指令,方便用户对智能家居设备的控制和使用,有利于提高智能性和准确性。In the above process, the user only needs to record, and the rest of the processing process does not require the user's participation, so that the terminal device can be automatically controlled by voice to execute corresponding control instructions, which is convenient for the user to control and use the smart home device, and is beneficial to Improve intelligence and accuracy.
需要说明的是:智能家居场景中可以包含多种终端设备,图22只是进行示例性说明,不对智能设备的种类和个数进行具体限制。It should be noted that: a smart home scene may include various terminal devices, and FIG. 22 is only an illustration, and does not specifically limit the type and number of smart devices.
本申请实施例提供的语音控制方法,可以基于计算机设备,或者计算机设备中的功能模块或者功能实体实现。The voice control method provided in the embodiment of the present application may be implemented based on a computer device, or a functional module or a functional entity in the computer device.
其中,计算机可以为个人计算机(personal computer,PC)、服务器、手机、平板电脑、笔记本电脑、大型计算机等,本申请实施例对此不作具体限定。Wherein, the computer may be a personal computer (personal computer, PC), a server, a mobile phone, a tablet computer, a notebook computer, a mainframe computer, etc., which are not specifically limited in this embodiment of the present application.
本申请实施例提供的语音控制方法可以基于上述计算机设备实现。The voice control method provided in the embodiment of the present application may be implemented based on the above computer device.
本申请实施例提供的语音控制过程,可以基于上述计算机设备实现,该方法可以对用户的语音指令进行识别,得到与语音指令对应的控制信息,控制信息包括功能类别和控制指令,接着根据预先建立的终端设备信息表,确定与功能类别对应的第一候选终端设备集合;然后基于第一候选终端设备集合中各候选终端设备对应的功能状态,确定与控制指令匹配的第二候选终端设备集合,最后从第二候选终端设备集合中确定与控制指令匹配的第二终端设备,并控制第二终端设备执行控制指令,通过语音自动控制终端设备执行相应的控制指令,方便用户对智能家居设备的控制和使用,有利于提高智能性和准确性。The voice control process provided by the embodiment of the present application can be implemented based on the above-mentioned computer equipment. This method can recognize the voice command of the user, and obtain the control information corresponding to the voice command. The control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first candidate terminal device set corresponding to the function category; then based on the functional state corresponding to each candidate terminal device in the first candidate terminal device set, determine the second candidate terminal device set that matches the control instruction, Finally, determine the second terminal device that matches the control command from the second candidate terminal device set, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for users to control smart home devices And use, help to improve intelligence and accuracy.
为了更加详细的说明本方案,以下将以示例性的方式结合图23A进行说明,可以理解的是,图23A中所涉及的步骤在实际实现时可以包括更多的步骤,或者更少的步骤,并且这些步骤之间的顺序也可以不同,以能够实现本申请实施例中提供的语音控制方法为准。In order to describe this solution in more detail, the following will be described in conjunction with FIG. 23A in an exemplary manner. It can be understood that the steps involved in FIG. 23A may include more steps or fewer steps in actual implementation, And the order of these steps may also be different, as long as the voice control method provided in the embodiment of the present application can be realized.
图23A为本申请实施例提供的一种语音控制方法的流程示意图,图23B为本申请实施例提供的一种语音控制方法的原理示意图。本实施例可适用于对智能家居场景中包含的各终端设备进行控制的情况。本实施例方法可由语音控制装置来执行,该装置可采用硬件/或软件的方式来实现,并可配置于计算机设备中。FIG. 23A is a schematic flowchart of a voice control method provided in an embodiment of the present application, and FIG. 23B is a schematic diagram of a principle of a voice control method provided in an embodiment of the present application. This embodiment is applicable to the situation of controlling each terminal device included in the smart home scene. The method of this embodiment can be executed by a voice control device, which can be implemented in hardware/or software, and can be configured in computer equipment.
如图23A所示,该方法具体包括如下步骤:As shown in Figure 23A, the method specifically includes the following steps:
S2301,对用户的语音指令进行识别,得到对应的控制信息,控制信息包括功能类别和控制指令。S2301. Recognize a user's voice command to obtain corresponding control information, where the control information includes a function category and a control command.
其中,语音指令可以理解为用户录音后所形成的数据。控制信息可以理解为与用户的语音指令对应的控制意图,其中包含了与终端设备相关的功能类别和控制指令,但不包含具体要控制的第二终端设备。终端设备可以理解为智能家居场景中包含的各种设备,例如音视频设备、照明系统、窗帘控制、空调控制、数字影院系统、影音服务器和网络家电等。功能类别可以理解为终端设备的具体功能所属的类别,例如智能电视对应的类别可以包括: 音量、亮度、视频播放场景以及菜谱场景等。控制指令可以理解为与终端设备相关的操作指令,例如打开、关闭、播放以及暂停等。Among them, the voice command can be understood as the data formed after the user records. The control information can be understood as the control intention corresponding to the user's voice command, which includes the function category and control instructions related to the terminal device, but does not include the specific second terminal device to be controlled. Terminal devices can be understood as various devices included in smart home scenarios, such as audio and video equipment, lighting systems, curtain control, air conditioning control, digital theater systems, audio and video servers, and network appliances. The function category can be understood as the category to which the specific functions of the terminal device belong. For example, the category corresponding to the smart TV may include: volume, brightness, video playback scene, and recipe scene. The control instruction can be understood as an operation instruction related to the terminal device, such as opening, closing, playing, and pausing.
在包含多个不同类型终端设备的智能家居场景中,各终端设备处于不同的控制状态,用户需要明确各终端设备具体的控制状态才能对终端设备进行控制,对于不指定终端设备的语音指令,可能会导致语音控制过程中执行失败或者需要多次引导用户补充信息以便确定要控制的终端设备。In a smart home scenario that includes multiple different types of terminal devices, each terminal device is in a different control state, and the user needs to specify the specific control state of each terminal device to control the terminal device. For voice commands that do not specify a terminal device, it may It will lead to execution failure during the voice control process or the need to guide the user to supplement information multiple times in order to determine the terminal device to be controlled.
本实施例中的执行主体可以为具有处理和交互功能的本地控制设备200,例如物联网终端,还可以为与智能终端300进行交互的服务器400。在获取到用户的录音数据后,由于本地控制设备200和服务器400无法直接获取语音指令中包含的具体信息,因此要对用户的语音指令进行识别,具体可以通过语音识别方法和语义理解方法进行识别,也可以通过神经网络模型或者语音识别系统等方法进行识别,本实施例不做具体限制。在识别之后,就能够得到与语音指令对应的控制信息。The execution subject in this embodiment may be a local control device 200 with processing and interaction functions, such as an Internet of Things terminal, or a server 400 that interacts with the smart terminal 300 . After obtaining the user's recording data, since the local control device 200 and the server 400 cannot directly obtain the specific information contained in the voice command, it is necessary to recognize the user's voice command, specifically through the voice recognition method and the semantic understanding method. , can also be recognized by a method such as a neural network model or a speech recognition system, which is not specifically limited in this embodiment. After the recognition, the control information corresponding to the voice command can be obtained.
S2302,根据预先建立的终端设备信息表,确定与功能类别对应的第一候选终端设备集合。S2302. Determine a first set of candidate terminal devices corresponding to the function category according to the pre-established terminal device information table.
其中,终端设备信息表可以理解为预先建立的与智能家居场景中各终端设备对应的信息相关的表格,表格中可以包括各终端设备分别对应的设备标识号、设备名称、功能类别以及功能状态等。第一候选终端设备集合可以理解为智能家居场景中包含的与功能类别相匹配的终端设备所组成的集合。Among them, the terminal device information table can be understood as a pre-established table related to the information corresponding to each terminal device in the smart home scene, and the table can include the device identification number, device name, function category and function status of each terminal device, etc. . The first set of candidate terminal devices can be understood as a set of terminal devices included in the smart home scene that match the function category.
在得到语音指令对应的控制信息之后,通过将终端设备信息表中各终端设备对应的功能类别与控制信息中的功能类别相匹配,能够得到与控制信息中的功能类别对应的第一候选终端设备集合。After obtaining the control information corresponding to the voice command, by matching the function category corresponding to each terminal device in the terminal device information table with the function category in the control information, the first candidate terminal device corresponding to the function category in the control information can be obtained gather.
S2303,基于第一候选终端设备集合中各候选终端设备对应的功能状态,确定与控制指令匹配的第二候选终端设备集合。S2303. Determine a second set of candidate terminal devices that matches the control instruction based on the function status corresponding to each candidate terminal device in the first set of candidate terminal devices.
其中,第二候选终端设备集合可以理解为智能家居场景中包含的与控制指令相匹配的第二终端设备所组成的集合,该集合为最终要确定的用户想要控制的第二终端设备的候选集合。Wherein, the second set of candidate terminal devices can be understood as a set of second terminal devices included in the smart home scene that match the control instructions, and this set is a candidate for the second terminal device that the user wants to control. gather.
第一候选终端设备集合中可能包含多个候选终端设备,且各候选终端设备可能处于不同的功能状态,因此在得到第一候选终端设备集合之后,为了确定用户想要控制的终端设备,需要进一步缩小范围。此时,根据第一候选终端设备集合中各候选终端设备对应的功能状态,将各候选终端设备对应的功能状态分别与控制信息中的控制指令进行比对,能够得到与该控制指令匹配的候选终端设备所形成的第二候选终端设备集合。The first candidate terminal device set may contain multiple candidate terminal devices, and each candidate terminal device may be in a different functional state. Therefore, after obtaining the first candidate terminal device set, in order to determine the terminal device that the user wants to control, further steps are required. narrow down. At this time, according to the functional status corresponding to each candidate terminal device in the first candidate terminal device set, the functional status corresponding to each candidate terminal device is compared with the control instruction in the control information, and the candidate that matches the control instruction can be obtained. A second set of candidate terminal devices formed by the terminal devices.
示例性的,假设控制指令为打开,候选终端设备1对应的功能状态为正常;候选终端设备2对应的功能状态为正在播放,候选终端设备3对应的功能状态为关闭,则将候选终端设备3加入第二候选终端设备集合中。Exemplarily, assuming that the control instruction is open, the functional state corresponding to candidate terminal device 1 is normal; the functional state corresponding to candidate terminal device 2 is playing, and the functional state corresponding to candidate terminal device 3 is off, then the candidate terminal device 3 Add to the second set of candidate terminal devices.
S2304,从第二候选终端设备集合中确定与控制指令匹配的第二终端设备,并控制第二终端设备执行控制指令。S2304. Determine a second terminal device that matches the control instruction from the second candidate terminal device set, and control the second terminal device to execute the control instruction.
其中,第二终端设备可以理解为与控制指令匹配的终端设备。Wherein, the second terminal device may be understood as a terminal device that matches the control instruction.
由于第二候选终端设备集合中包含的终端设备可能有多个,因此还要从第二候选终端设备集合中确定与控制指令匹配的第二终端设备,第二终端设备的个数可以为多个,可以视具体情况而定,本申请不做具体限制。在确定了第二终端设备之后,向终端设备发送对 应的控制指令,以控制第二终端设备执行控制指令,从而满足用户的需求,准确执行符合用户的语音指令的第二终端设备的控制。Since there may be multiple terminal devices included in the second candidate terminal device set, it is necessary to determine the second terminal device that matches the control instruction from the second candidate terminal device set, and the number of second terminal devices may be multiple , may depend on specific circumstances, and this application does not make specific limitations. After the second terminal device is determined, a corresponding control command is sent to the terminal device to control the second terminal device to execute the control command, so as to meet the needs of the user and accurately execute the control of the second terminal device in accordance with the user's voice command.
可选的,图23C为本申请实施例中确定第二候选终端设备集合的过程的示意图,如图23C所示:Optionally, FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application, as shown in FIG. 23C:
1、确定所有终端设备对应的设备名称总集合、功能类别总集合以及功能状态总集合;1. Determine the total set of device names, the total set of function categories, and the total set of functional states corresponding to all terminal devices;
其中,设备名称总集合定义为Dev,各个终端设备分别为Dev1、Dev2、Dev3、…;Among them, the total set of device names is defined as Dev, and each terminal device is Dev1, Dev2, Dev3, ...;
功能类别总集合定义为F,各功能分别为F1、F2、F3、…;The total set of functional categories is defined as F, and each function is F1, F2, F3, ...;
功能状态总集合定义为S,各功能状态分别为S1、S2、S3、…。The total set of functional states is defined as S, and the functional states are respectively S1, S2, S3, . . .
2、确定每个终端设备包含的功能类别和功能状态的组合,对应集合如下:2. Determine the combination of function category and function status included in each terminal device, and the corresponding set is as follows:
Dev1={F1S1,F2S2,…}Dev1={F1S1, F2S2,...}
Dev2={F1S2,F2S3,F3S3,…}Dev2={F1S2, F2S3, F3S3,...}
Dev3={F1S1,F2S3,F4S1,F5S2,…}Dev3={F1S1, F2S3, F4S1, F5S2,...}
Dev4={F3S3,F3S3,F5S5,…}Dev4={F3S3, F3S3, F5S5,...}
3、对用户的语音指令进行识别,得到功能类别和控制指令对应的组合FxSy。3. Recognize the user's voice command, and obtain the combination FxSy corresponding to the function category and the control command.
4、查询步骤2中的各集合,根据集合中元素与FxSy相同的集合所对应的终端设备确定第二候选终端设备集合。4. Query the sets in step 2, and determine the second set of candidate terminal devices according to the terminal devices corresponding to the sets whose elements are the same as FxSy.
在本实施例中,首先对用户的语音指令进行识别,得到与语音指令对应的控制信息,控制信息包括功能类别和控制指令,接着根据预先建立的终端设备信息表,确定与功能类别对应的第一候选终端设备集合;然后基于第一候选终端设备集合中各候选终端设备对应的功能状态,确定与控制指令匹配的第二候选终端设备集合,最后从第二候选终端设备集合中确定与控制指令匹配的第二终端设备,并控制第二终端设备执行控制指令,通过语音自动控制终端设备执行相应的控制指令,方便用户对智能家居设备的控制和使用,有利于提高智能性和准确性。In this embodiment, the user's voice command is firstly recognized to obtain the control information corresponding to the voice command. The control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first A set of candidate terminal devices; then, based on the functional state corresponding to each candidate terminal device in the first set of candidate terminal devices, determine a second set of candidate terminal devices that matches the control instruction, and finally determine from the second set of candidate terminal devices that matches the control command Match the second terminal device, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for the user to control and use the smart home device, and is conducive to improving intelligence and accuracy.
在一些实施例中,可选的,所述终端设备信息表通过以下方式获取:In some embodiments, optionally, the terminal device information table is obtained in the following manner:
获取在预设场景中包含的各终端设备分别对应的设备名称、功能名称、功能类别以及功能状态;Obtain the device name, function name, function category and function status corresponding to each terminal device included in the preset scene;
根据所有的设备名称、功能名称、功能类别以及功能状态,建立或更新对应的终端设备信息表。According to all the device names, function names, function categories and function states, create or update the corresponding terminal device information table.
其中,预设场景可以理解为包含多个终端设备且多个终端设备通过网络互连的场景,例如智能家居场景、智能办公场景等。Wherein, the preset scene can be understood as a scene that includes multiple terminal devices and the multiple terminal devices are interconnected through a network, such as a smart home scene, a smart office scene, and the like.
具体的,通过终端设备信息上报的方式能够获取在预设场景中包含的各终端设备分别对应的设备名称、功能名称、功能类别以及功能状态,还可以通过其他方式获取各终端设备分别对应的设备名称、功能名称、功能类别以及功能状态。在获取到各终端设备对应的信息之后,根据所有的设备名称、功能名称、功能类别以及功能状态,能够建立对应的终端设备信息表,或者在设备名称、功能名称、功能类别以及功能状态中的至少一种发生变化之后能够及时更新终端设备信息表。Specifically, the device name, function name, function category, and function status corresponding to each terminal device included in the preset scene can be obtained through the reporting of terminal device information, and the corresponding device names of each terminal device can also be obtained in other ways. Name, feature name, feature category, and feature status. After obtaining the information corresponding to each terminal device, according to all the device names, function names, function categories and function states, it is possible to establish a corresponding terminal device information table, or in the device name, function name, function category and function status. The terminal device information table can be updated in time after at least one of the changes occurs.
本实施例中,通过上述方法建立或更新对应的终端设备信息表,能够保证终端设备信息表与各终端设备的实际功能状态保持一致,从而有利于确定第一候选终端设备集合以及保证该集合的准确性。In this embodiment, by establishing or updating the corresponding terminal device information table through the above method, it can be ensured that the terminal device information table is consistent with the actual functional status of each terminal device, thereby facilitating the determination of the first set of candidate terminal devices and ensuring the accuracy of the set. accuracy.
在一些实施例中,可选的,所述方法还包括:In some embodiments, optionally, the method further includes:
若所述第一候选终端设备集合为空集合,或者若所述第一候选终端设备集合为非空集合且所述第二候选终端设备集合为空集合,则发送第二提示信息,其中,所述第二提示信息用于指示用户从多个终端设备中确定所述第二终端设备;If the first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, then send second prompt information, wherein the The second prompt information is used to instruct the user to determine the second terminal device from multiple terminal devices;
接收第二应答信息,其中,所述第二应答信息中包含所述第二终端设备对应的第二标识信息;receiving second response information, where the second response information includes second identification information corresponding to the second terminal device;
控制所述第二标识信息对应的第二终端设备执行所述控制指令。controlling the second terminal device corresponding to the second identification information to execute the control instruction.
其中,第一候选终端设备集合为空集合可以理解为该集合中没有符合条件的候选终端设备。第二候选终端设备集合为空集合可以理解为该集合中也没有符合条件的终端设备。Wherein, the first set of candidate terminal devices being an empty set may be understood as that there is no candidate terminal device meeting the conditions in the set. It may be understood that the second candidate terminal device set is an empty set, which means that there is no qualified terminal device in the set.
具体的,如果第一候选终端设备集合为空集合,或者如果第一候选终端设备集合为非空集合且第二候选终端设备集合为空集合,则说明当前无法确定出用户实际想要控制的终端设备。此时,可以发送第二提示信息,例如,本地控制设备200发送第二提示信息,例如可以向自身的显示屏或者音频应用发送第二提示信息,以显示或者播放第二提示信息,以指示用户从多个终端设备中确定第二终端设备;或者服务器400向智能终端300发送第二提示信息,以指示用户从多个终端设备中确定第二终端设备。接收用户反馈的第二应答信息,由于该第二应答信息中包含第二终端设备对应的第二标识信息,因此能够直接控制第二标识信息对应的第二终端设备执行该控制指令。Specifically, if the first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, it means that the terminal that the user actually wants to control cannot be determined currently. equipment. At this time, the second prompt information can be sent, for example, the local control device 200 can send the second prompt information, for example, it can send the second prompt information to its own display screen or audio application to display or play the second prompt information to instruct the user Determine the second terminal device from the multiple terminal devices; or the server 400 sends second prompt information to the smart terminal 300 to instruct the user to determine the second terminal device from the multiple terminal devices. The second response information fed back by the user is received. Since the second response information includes the second identification information corresponding to the second terminal device, the second terminal device corresponding to the second identification information can be directly controlled to execute the control instruction.
本实施例中,在当前无法确定出用户实际想要控制的终端设备时,通过上述方法能够确定第二终端设备,从而满足用户的控制需求,提高用户的使用体验。In this embodiment, when the terminal device that the user actually wants to control cannot be determined currently, the second terminal device can be determined through the above method, so as to meet the control requirement of the user and improve the user experience.
图24A为本申请实施例提供的另一种语音控制方法的流程示意图,图24B为本申请实施例提供的另一种语音控制方法的原理示意图。本实施例是在上述实施例的基础上进一步扩展与优化。可选的,本实施例中S2304的一种可能的实现方式如下:FIG. 24A is a schematic flow chart of another voice control method provided by the embodiment of the present application, and FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application. This embodiment is further expanded and optimized on the basis of the foregoing embodiments. Optionally, a possible implementation of S2304 in this embodiment is as follows:
S23041,确定与控制指令匹配的第二候选终端设备集合中包含的所有第二候选终端设备的数量。S23041. Determine the number of all second candidate terminal devices included in the second candidate terminal device set that matches the control instruction.
由于第二候选终端设备集合中可能包含了多个第二候选终端设备,因此为了确定与控制指令匹配的第二终端设备,需要确定与控制指令匹配的第二候选终端设备集合中包含的所有第二候选终端设备的数量,以便后续根据数量与预设阈值的大小关系,从第二候选终端设备集合中确定第二终端设备。Since the second candidate terminal device set may contain multiple second candidate terminal devices, in order to determine the second terminal device that matches the control instruction, it is necessary to determine all the second candidate terminal devices that are included in the second candidate terminal device set that match the control instruction. The number of two candidate terminal devices, so that the second terminal device can be determined from the second candidate terminal device set according to the relationship between the number and the preset threshold.
S23042,根据数量与预设阈值的大小关系,从第二候选终端设备集合中确定第二终端设备,并控制第二终端设备执行控制指令。S23042. Determine the second terminal device from the second candidate terminal device set according to the relationship between the quantity and the preset threshold, and control the second terminal device to execute the control instruction.
其中,预设阈值可以为预先设定的数值,例如1个、3个等,也可以视具体情况而定,本实施例不做具体限制。Wherein, the preset threshold may be a preset value, such as 1, 3, etc., and may also be determined according to specific circumstances, which is not specifically limited in this embodiment.
在得到第二候选终端设备集合中包含的所有第二候选终端设备的数量之后,比较该数量与预设阈值的大小,得到二者的大小关系,以便后续根据二者的大小关系从第二候选终端设备集合中确定出第二终端设备,例如将第二候选终端设备集合中包含的所有第二候选终端设备均为第二终端设备,或者将第二候选终端设备集合中包含的部分第二候选终端设备为第二终端设备。在确定了第二终端设备之后,还要控制第二终端设备执行控制指令,从而实现通过语音进行智能家居控制,减少用户的操作。After obtaining the number of all second candidate terminal devices contained in the second candidate terminal device set, compare the number with the size of the preset threshold to obtain the size relationship between the two, so that the subsequent selection from the second candidate terminal device can be based on the size relationship between the two. The second terminal device is determined from the terminal device set, for example, all the second candidate terminal devices included in the second candidate terminal device set are second terminal devices, or part of the second candidate terminal devices included in the second candidate terminal device set The terminal device is the second terminal device. After the second terminal device is determined, it is necessary to control the second terminal device to execute the control instruction, so as to implement smart home control through voice and reduce user operations.
在本实施例中,通过上述方法确定第二终端设备简单快捷,能够提高工作效率。In this embodiment, determining the second terminal device through the above method is simple and quick, and can improve work efficiency.
图25A为本申请实施例提供的又一种语音控制方法的流程示意图,图25B为本申请实施例提供的又一种语音控制方法的原理示意图。本实施例是在上述实施例的基础上进一步 扩展与优化。可选的,本实施例中S23042的一种可能的实现方式如下:FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application, and FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application. This embodiment is further expanded and optimized on the basis of the foregoing embodiments. Optionally, a possible implementation of S23042 in this embodiment is as follows:
S230421,确定第二候选终端设备的数量是否小于或者等于预设阈值。S230421. Determine whether the number of second candidate terminal devices is less than or equal to a preset threshold.
在得到第二候选终端设备集合中包含的所有第二候选终端设备的数量之后,比较该数量与预设阈值之间的大小关系,能够确定第二候选终端设备的数量是否小于或者等于预设阈值。After obtaining the number of all second candidate terminal devices contained in the second candidate terminal device set, comparing the size relationship between the number and the preset threshold, it can be determined whether the number of second candidate terminal devices is less than or equal to the preset threshold .
若是,执行S230422-S230423;若否,执行S230424-S230426。If yes, execute S230422-S230423; if not, execute S230424-S230426.
S230422,将第二候选终端设备集合中包含的所有第二候选终端设备确定为第二终端设备。S230422. Determine all second candidate terminal devices included in the second candidate terminal device set as second terminal devices.
若第二候选终端设备的数量小于或者等于预设阈值,则说明第二候选终端设备的数量未超过上限,因此,将第二候选终端设备集合中包含的所有第二候选终端设备确定为第二终端设备。If the number of second candidate terminal devices is less than or equal to the preset threshold, it means that the number of second candidate terminal devices does not exceed the upper limit. Therefore, all second candidate terminal devices included in the second candidate terminal device set are determined as the second Terminal Equipment.
S230423,控制各第二终端设备分别执行控制指令。S230423. Control each second terminal device to respectively execute the control instruction.
在将第二候选终端设备集合中包含的所有第二候选终端设备确定为第二终端设备之后,还需要向各第二终端设备发送控制指令,以控制各第二终端设备分别执行该控制指令。After all the second candidate terminal devices included in the second candidate terminal device set are determined as the second terminal devices, a control instruction needs to be sent to each second terminal device, so as to control each second terminal device to respectively execute the control instruction.
S230424,发送第一提示信息,其中,第一提示信息用于指示用户从第二候选终端设备集合中确定第二终端设备。S230424. Send first prompt information, where the first prompt information is used to instruct the user to determine the second terminal device from the second candidate terminal device set.
若第二候选终端设备的数量大于预设阈值,则说明第二候选终端设备的数量超过了上限,为了避免同时对多个终端设备执行同样的控制指令,从而影响用户的正常使用,此时需要发送第一提示信息,例如,本地控制设备200发送第一提示信息,例如可以向自身的显示屏或者音频应用发送第一提示信息,以显示或者播放第一提示信息,以指示用户从多个终端设备中确定第二终端设备;或者服务器400向智能终端设备204发送第一提示信息,以指示用户从多个终端设备中确定第二终端设备。If the number of second candidate terminal devices is greater than the preset threshold, it means that the number of second candidate terminal devices exceeds the upper limit. Sending the first prompt information, for example, the local control device 200 sends the first prompt information, for example, it can send the first prompt information to its own display screen or audio application, so as to display or play the first prompt information, to instruct the user to send the first prompt information from multiple terminals The second terminal device is determined in the device; or the server 400 sends the first prompt information to the smart terminal device 204 to instruct the user to determine the second terminal device from multiple terminal devices.
S230425,接收第一应答信息,其中,第一应答信息中包含第二终端设备对应的第一标识信息。S230425. Receive first response information, where the first response information includes first identification information corresponding to the second terminal device.
接收用户反馈的第一应答信息,以便后续控制第一标识信息对应的第二终端设备执行控制指令。The first response information fed back by the user is received, so as to subsequently control the second terminal device corresponding to the first identification information to execute the control instruction.
S230426,控制第一标识信息对应的第二终端设备执行控制指令。S230426. Control the second terminal device corresponding to the first identification information to execute the control instruction.
由于该第一应答信息中包含第二终端设备对应的第一标识信息,因此能够直接控制第一标识信息对应的第二终端设备执行该控制指令。Since the first response information includes the first identification information corresponding to the second terminal device, it is possible to directly control the second terminal device corresponding to the first identification information to execute the control instruction.
在本实施例中,根据第二候选终端设备集合中包含的所有第二候选终端设备的数量与预设阈值之间的两种大小关系,分别执行相应的步骤,能够进一步提高智能家居语音控制过程的智能性和准确性。In this embodiment, according to the two size relationships between the number of all second candidate terminal devices contained in the second candidate terminal device set and the preset threshold, corresponding steps are respectively performed, which can further improve the smart home voice control process. intelligence and accuracy.
图26A为本申请实施例提供的又一种语音控制方法的流程示意图,图26B为本申请实施例提供的又一种语音控制方法的原理示意图。本实施例是在上述实施例的基础上进一步扩展与优化。可选的,本实施例中S2301的一种可能的实现方式如下:FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application, and FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application. This embodiment is further expanded and optimized on the basis of the foregoing embodiments. Optionally, a possible implementation of S2301 in this embodiment is as follows:
S23011,通过语音识别方法对语音指令进行文本识别,得到语音指令对应的文本信息。S23011. Perform text recognition on the voice command by a voice recognition method to obtain text information corresponding to the voice command.
其中,语音识别方法为将语音转化为文本的方法,例如语音识别软件。Wherein, the speech recognition method is a method for converting speech into text, such as speech recognition software.
通过语音识别方法能够对语音指令进行文本识别,从而得到语音指令对应的文本信息。The speech recognition method can perform text recognition on the speech instruction, so as to obtain the text information corresponding to the speech instruction.
S23012,通过语义理解方法对文本信息进行语义理解,得到文本信息中包含的控制信息,控制信息包括功能类别和控制指令。S23012, perform semantic understanding on the text information by using a semantic understanding method, and obtain control information contained in the text information, where the control information includes function categories and control instructions.
其中,语义理解方法可以包括关键字提取方法、信息抽取方法等。Wherein, the semantic understanding method may include a keyword extraction method, an information extraction method, and the like.
在得到文本信息之后,由于机器识别出的文本信息可能包含冗余信息、重复信息等,为了进一步提高识别过程的准确性,通过语义理解方法对文本信息进行语义理解,得到文本信息中包含的控制信息,该控制信息包括功能类别和控制指令。After obtaining the text information, because the text information recognized by the machine may contain redundant information, repeated information, etc., in order to further improve the accuracy of the recognition process, the text information is semantically understood through the semantic understanding method, and the control information contained in the text information is obtained. Information, the control information includes function categories and control instructions.
在本实施例中,通过上述方法得到控制信息更为准确,也更符合实际情况,有利于保证后续过程的顺利进行。In this embodiment, the control information obtained through the above method is more accurate and more in line with the actual situation, which is beneficial to ensure the smooth progress of the subsequent process.
示例性的,图26C为本申请实施例中得到控制信息的过程的示意图,如图26C所示:Exemplarily, FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application, as shown in FIG. 26C:
首先对语音指令进行语音识别得到第一信息,接着对第一信息进行语义理解,即可得到控制信息。Firstly, voice recognition is performed on the voice command to obtain the first information, and then the semantic understanding of the first information is performed to obtain the control information.
示例性的,图27A为本申请实施例中一种本地控制设备的结构示意图,如图27A所示:Exemplarily, FIG. 27A is a schematic structural diagram of a local control device in the embodiment of the present application, as shown in FIG. 27A:
该本地控制设备200包括语音识别服务、语义理解服务以及终端设备控制服务。其中,语音识别服务主要用于录音以及对用户的语音指令进行识别,得到识别结果;语义理解服务主要用于根据识别结果确定控制信息;家居控制服务用于维护终端设备信息表、接收终端设备上报的设备信息以及根据控制信息控制对应的终端设备。The local control device 200 includes voice recognition service, semantic understanding service and terminal device control service. Among them, the speech recognition service is mainly used for recording and recognizing the user's voice command to obtain the recognition result; the semantic understanding service is mainly used for determining the control information according to the recognition result; the home control service is used for maintaining the terminal device information table and receiving the terminal device report device information and control the corresponding terminal device according to the control information.
图27B为本申请实施例中一种本地控制设备与终端设备进行交互的结构示意图,如图27B所示:FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in the embodiment of the present application, as shown in FIG. 27B:
语音识别服务中包括录音模块和识别引擎,其中,录音模块用于录音,识别引擎用于根据用户的语音指令进行识别,得到识别结果。语义理解服务中包括功能类别和控制指令。家居控制服务中包括终端设备信息表、确定第二终端设备以及语音指令控制。家居控制服务与各终端设备,例如终端设备A、终端设备B、…、终端设备N,之间进行交互,家居控制服务根据各终端设备上报的设备信息获取终端设备信息表;根据终端设备信息表以及功能类别确定第一候选终端设备集合;根据第二候选终端设备集合以及控制指令确定第二候选终端设备集合,以及从第二候选终端设备集合中确定第二终端设备,并控制第二终端设备执行该控制指令,从而实现通过语音指令对智能家居进行控制。终端设备负责上报设备信息以及接收并执行家居控制服务下发的控制指令。The voice recognition service includes a recording module and a recognition engine, wherein the recording module is used for recording, and the recognition engine is used for recognition according to the user's voice command to obtain a recognition result. Semantic understanding services include functional categories and control instructions. The home control service includes a terminal device information table, determining a second terminal device, and voice command control. The home control service interacts with each terminal device, such as terminal device A, terminal device B, ..., terminal device N, and the home control service obtains the terminal device information table according to the device information reported by each terminal device; according to the terminal device information table and the functional category to determine the first set of candidate terminal devices; determine the second set of candidate terminal devices according to the second set of candidate terminal devices and the control instruction, and determine the second set of terminal devices from the second set of candidate terminal devices, and control the second terminal device Execute the control command, so as to realize the control of the smart home through the voice command. The terminal device is responsible for reporting device information and receiving and executing control instructions issued by the home control service.
在一些实施例中,假设终端设备信息表如下表2所示:In some embodiments, it is assumed that the terminal device information table is as shown in Table 2 below:
设备标识号device identification number 设备名称device name 功能类别Functional category 功能状态functional status
11 窗帘curtain 亮度brightness 关闭closure
22 电视television 亮度brightness 正常normal
33 电视television 音量volume 正常normal
44 电视television 菜谱场景recipe scene 展示UIshow UI
55 智能音箱smart speaker 音乐场景music scene 正在播放Now Playing
66 烤箱oven 食材场景food scene 已放置牛排steak placed
77 冰箱refrigerator Door 开启turn on
88 洗衣机washing machine Door 关闭closure
表2Table 2
表2中各终端设备包含的信息如下:The information contained in each terminal device in Table 2 is as follows:
1、窗帘已关闭;1. The curtains are closed;
2、电视亮度正常、音量正常且正在展示菜谱UI;2. The brightness and volume of the TV are normal and the menu UI is being displayed;
3、智能音箱正在播放音乐;3. The smart speaker is playing music;
4、烤箱里已放置牛排,等待开启烹饪。4. The steak has been placed in the oven, waiting to start cooking.
5、冰箱的门是开启状态。5. The refrigerator door is open.
6、洗衣机的门是关闭状态。6. The door of the washing machine is closed.
需要说明的是:表1中包含的各设备均为终端设备。It should be noted that each device included in Table 1 is a terminal device.
示例1:假设语音指令为“太暗了”,功能类别为亮度,控制指令为增加,预设阈值为1,则根据功能类别和表1能够确定第一候选终端设备集合为:1窗帘和2电视;接着根据第一候选终端设备集合中各终端设备的功能状态以及控制指令可以确定第二候选终端设备集合为:1窗帘,由于第二候选终端设备集合中包含的所有第二候选终端设备的数量等于预设阈值,则确定第二终端设备为1窗帘,并控制窗帘执行开启功能。Example 1: Suppose the voice command is "too dark", the function category is brightness, the control command is increase, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 1 Curtain and 2 TV; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 1 curtain, because all the second candidate terminal devices contained in the second candidate terminal device set If the number is equal to the preset threshold, it is determined that the second terminal device is a curtain, and the curtain is controlled to perform an opening function.
示例2:假设语音指令为“我要烤牛排”,功能类别为食材场景和菜谱场景,控制指令为烹饪和查询,预设阈值为1,则根据功能类别和表1能够确定第一候选终端设备集合为:4电视和6烤箱;接着根据第一候选终端设备集合中各终端设备的功能状态以及控制指令可以确定第二候选终端设备集合为:6烤箱,由于第二候选终端设备集合中包含的所有第二候选终端设备的数量等于预设阈值,则确定第二终端设备为6烤箱,并控制烤箱执行烤牛排功能。Example 2: Assume that the voice command is "I want to grill a steak", the functional category is ingredient scene and recipe scene, the control command is cooking and query, and the preset threshold is 1, then the first candidate terminal device can be determined according to the functional category and Table 1 The set is: 4 televisions and 6 ovens; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 6 ovens, because the second candidate terminal device set contains The number of all second candidate terminal devices is equal to the preset threshold, then it is determined that the second terminal devices are 6 ovens, and the ovens are controlled to perform the function of grilling steaks.
在一些实施例中,若检测到烤箱里不是牛排,但电视支持菜谱查询功能,则执行电视的烤牛排菜谱介绍。In some embodiments, if it is detected that there is no steak in the oven, but the TV supports the recipe query function, the TV will introduce the grilled steak recipe.
示例3:假设语音指令为“音量太大了”,功能类别为音量,控制指令为降低,预设阈值为1,则根据功能类别和表1能够确定第一候选终端设备集合为:3电视和5智能音箱;接着根据第一候选终端设备集合中各终端设备的功能状态以及控制指令可以确定第二候选终端设备集合为:5智能音箱,由于第二候选终端设备集合中包含的所有第二候选终端设备的数量等于预设阈值,则确定第二终端设备为5智能音箱,并控制智能音箱执行音量降低功能。Example 3: Assuming that the voice command is "too loud", the function category is volume, the control command is lower, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 3 TVs and 5 smart speakers; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 5 smart speakers, because all the second candidate terminal devices contained in the second candidate terminal device set If the number of terminal devices is equal to the preset threshold, it is determined that the second terminal device is 5 smart speakers, and the smart speakers are controlled to perform a volume down function.
在一些实施例中,若检测到电视也在播放视频,则提示用户选择调低音量的设备是电视还是音箱。In some embodiments, if it is detected that the TV is also playing a video, the user is prompted to select whether the device to turn down the volume is a TV or a sound box.
示例4:假设语音指令为“关门”,功能类别为门,控制指令为关闭,预设阈值为1,则根据功能类别和表1能够确定第一候选终端设备集合为:7冰箱和8电视机;接着根据第一候选终端设备集合中各终端设备的功能状态以及控制指令可以确定第二候选终端设备集合为:7冰箱,由于第二候选终端设备集合中包含的所有第二候选终端设备的数量等于预设阈值,则确定第二终端设备为7冰箱,并控制冰箱执行关门功能。Example 4: Suppose the voice command is "close the door", the function category is door, the control command is close, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 7 refrigerators and 8 televisions ; Then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 7 refrigerators, due to the number of all second candidate terminal devices contained in the second candidate terminal device set is equal to the preset threshold, then it is determined that the second terminal device is a 7 refrigerator, and the refrigerator is controlled to execute the door closing function.
在一些实施例中,如果检测到烤箱门也是打开状态,则提示用户要关闭门的设备是冰箱还是烤箱。In some embodiments, if it is detected that the oven door is also open, the user is prompted whether the device to close the door is a refrigerator or an oven.
在一些实施例中,本申请实施例提供了一种电子设备,所述电子设备至少包括处理器和存储器,所述处理器用于执行存储器中存储的计算机程序时实现上述中任意一种终端设备的语音控制方法。In some embodiments, the embodiment of the present application provides an electronic device, the electronic device includes at least a processor and a memory, and the processor is used to implement the operation of any one of the above-mentioned terminal devices when executing the computer program stored in the memory. Voice control method.
在一些实施例中,本申请实施例提供了一种计算机可读非易失性存储介质,其存储有 计算机程序,所述计算机程序被处理器执行时实现如上述中任意一种终端设备语音控制方法。In some embodiments, the embodiment of the present application provides a computer-readable non-volatile storage medium, which stores a computer program, and when the computer program is executed by a processor, it realizes voice control of any terminal device as described above. method.

Claims (33)

  1. 一种进行语音控制的服务器,所述服务器,用于执行:A server for voice control, the server is used to execute:
    接收第一终端设备发送的语音信号,根据所述语音信号生成语音指令,以及将所述语音指令反馈至所述第一终端设备;receiving a voice signal sent by the first terminal device, generating a voice command according to the voice signal, and feeding back the voice command to the first terminal device;
    在所述第一终端设备不可执行所述语音指令对应的操作时,接收所述第一终端设备发送的指令分发请求,所述指令分发请求携带有所述语音指令;When the first terminal device cannot perform the operation corresponding to the voice command, receiving an instruction distribution request sent by the first terminal device, the command distribution request carrying the voice command;
    根据所述指令分发请求,查找可执行所述语音指令对应操作的第二终端设备,以及将所述语音指令发送至所述第二终端设备,以使所述第二终端设备响应于所述语音指令执行对应的操作;According to the instruction distribution request, search for a second terminal device that can perform the operation corresponding to the voice instruction, and send the voice instruction to the second terminal device, so that the second terminal device responds to the voice The command executes the corresponding operation;
    在所述第一终端设备可执行所述语音指令对应的操作时,不接收所述第一终端设备发送的指令分发请求。When the first terminal device can execute the operation corresponding to the voice instruction, the instruction distribution request sent by the first terminal device is not received.
  2. 根据权利要求1所述的服务器,在所述语音指令只携带有设备名称时,根据所述指令分发请求,将所述语音指令发送至第二终端设备,包括:According to the server according to claim 1, when the voice command only carries the device name, sending the voice command to the second terminal device according to the command distribution request, comprising:
    根据所述指令分发请求,查找所述设备名称对应的所述第二终端设备,以及将所述语音指令发送至所述第二终端设备。Searching for the second terminal device corresponding to the device name according to the instruction distribution request, and sending the voice instruction to the second terminal device.
  3. 根据权利要求1所述的服务器,在所述语音指令只携带有设备能力参数时,根据所述指令分发请求,将所述语音指令发送至第二终端设备,包括:According to the server according to claim 1, when the voice command only carries device capability parameters, sending the voice command to the second terminal device according to the command distribution request includes:
    根据所述指令分发请求,查找具备所述设备能力参数的所述第二终端设备,以及将所述语音指令发送至所述第二终端设备。Searching for the second terminal device having the device capability parameter according to the instruction distribution request, and sending the voice instruction to the second terminal device.
  4. 根据权利要求1所述的服务器,在所述语音指令为自定义规则对应的指令时,其中,在所述自定义规则中所述语音指令与所述终端设备具有对应关系,根据所述指令分发请求,将所述语音指令发送至第二终端设备,包括:According to the server according to claim 1, when the voice command is an command corresponding to a custom rule, wherein, in the custom rule, the voice command has a corresponding relationship with the terminal device, and the voice command is distributed according to the command request, sending the voice command to the second terminal device, including:
    根据所述指令分发请求,查找与所述自定义规则具有对应关系的所述第二终端设备,以及将所述语音指令发送至所述第二终端设备。Searching for the second terminal device corresponding to the custom rule according to the instruction distribution request, and sending the voice instruction to the second terminal device.
  5. 根据权利要求1所述的服务器,所述语音指令包括至少两条匹配项,每一条所述匹配项设置有权重属性值;The server according to claim 1, wherein the voice instruction includes at least two matching items, each of which is provided with a weight attribute value;
    根据所述指令分发请求,将所述语音指令发送至第二终端设备,包括:According to the instruction distribution request, sending the voice instruction to the second terminal device includes:
    在存在至少两个终端设备满足所述语音指令中的至少一条所述匹配项时,计算所述终端设备满足的所述匹配项的权重属性总值,将所述语音指令发送至所述第二终端设备,其中,所述权重属性总值为所述匹配项的权重值的总和,所述第二终端设备的权重属性总值最大。When there are at least two terminal devices that satisfy at least one of the matching items in the voice instructions, calculate the total weight attribute value of the matching items that the terminal devices satisfy, and send the voice instructions to the second The terminal device, wherein the total value of the weight attribute is the sum of the weight values of the matching items, and the total value of the weight attribute of the second terminal device is the largest.
  6. 根据权利要求5所述的服务器,所述匹配项为设备名称、设备响应时间段、设备存在的空间以及设备能力参数中的其中一种。The server according to claim 5, wherein the matching item is one of a device name, a response time period of the device, a space where the device exists, and a device capability parameter.
  7. 根据权利要求1所述的服务器,所述服务器用于执行:The server according to claim 1, configured to execute:
    在所述语音指令中解析业务需求信息;Analyzing the service requirement information in the voice instruction;
    根据所述业务需求信息筛选第二终端设备,所述第二终端设备为所述设备状态能够实现所述业务需求信息的终端设备;Screening a second terminal device according to the service requirement information, where the second terminal device is a terminal device capable of realizing the service requirement information in the device state;
    所述服务器还用于:向当前语音控制系统中所述第二终端设备以外的其他终端设备发送静默指令。The server is further configured to: send a silence instruction to other terminal devices in the current voice control system other than the second terminal device.
  8. 根据权利要求7所述的服务器,所述服务器进一步用于执行:The server according to claim 7, the server is further configured to perform:
    获取所述语音指令对应的语音音频数据,从所述语音音频数据中识别唤醒词;Acquiring voice and audio data corresponding to the voice command, and identifying a wake-up word from the voice and audio data;
    如果所述语音音频数据中包括所述唤醒词,定位第一终端设备所在的语音控制系统;If the voice and audio data includes the wake-up word, locate the voice control system where the first terminal device is located;
    向所述语音控制系统发送状态获取请求,以使所述语音控制系统中的全部终端设备在接收到所述状态获取指令后,上报所述设备状态。Sending a status acquisition request to the voice control system, so that all terminal devices in the voice control system report the device status after receiving the status acquisition instruction.
  9. 根据权利要求7所述的服务器,所述业务需求信息包括业务类型和业务状态,所述服务器进一步用于执行:According to the server according to claim 7, the business requirement information includes business type and business status, and the server is further used to perform:
    在根据所述业务需求信息筛选第二终端设备的步骤中,从所述业务需求信息中提取所述业务类型和业务状态;In the step of screening the second terminal device according to the service requirement information, extracting the service type and service status from the service requirement information;
    在所述设备状态中匹配满足所述业务类型的候选终端设备,所述候选终端设备具有符合所述业务类型需要的设备类型;matching a candidate terminal device that meets the service type in the device state, where the candidate terminal device has a device type that meets the requirements of the service type;
    遍历所述候选终端设备的设备状态,以筛选出所述设备状态符合所述业务状态的第二终端设备。Traversing the device states of the candidate terminal devices to filter out a second terminal device whose device state conforms to the service state.
  10. 根据权利要求9所述的服务器,所述业务需求信息还包括业务执行位置,所述服务器进一步用于执行:According to the server according to claim 9, the service requirement information also includes a service execution location, and the server is further used to perform:
    在根据所述业务需求信息筛选第二终端设备的步骤中,从所述业务需求信息中提取所述业务执行位置;In the step of screening the second terminal device according to the service requirement information, extracting the service execution location from the service requirement information;
    获取当前语音控制系统中各候选终端设备的设备位置;Obtain the device location of each candidate terminal device in the current voice control system;
    如果所述候选终端设备的设备位置与所述业务执行位置重合,执行遍历所述候选终端设备的设备状态的步骤;If the device location of the candidate terminal device coincides with the service execution location, perform the step of traversing the device status of the candidate terminal device;
    如果所述候选终端设备的设备位置与所述业务执行位置不重合,标记所述候选终端设备不是所述第二终端设备。If the device location of the candidate terminal device does not coincide with the service execution location, mark that the candidate terminal device is not the second terminal device.
  11. 根据权利要求10所述的服务器,所述服务器进一步用于执行:The server according to claim 10, the server is further configured to perform:
    在根据所述业务需求信息筛选第二终端设备的步骤中,获取所述设备状态能够实现所述业务需求信息的终端设备数量;In the step of screening the second terminal device according to the service requirement information, acquiring the number of terminal devices whose device status can realize the service requirement information;
    如果所述终端设备数量大于或等于2,查找主终端设备,以使用所述主终端设备与用户交互确定所述第二终端设备,所述主终端设备为能够实现所述业务需求信息的多个终端设备中的一个;If the number of terminal devices is greater than or equal to 2, search for the main terminal device to use the main terminal device to interact with the user to determine the second terminal device. The main terminal device is a plurality of one of the terminal equipment;
    如果所述终端设备数量等于1,标记所述能够实现所述业务需求信息的终端设备为所述第二终端设备。If the number of terminal devices is equal to 1, mark the terminal device capable of realizing the service requirement information as the second terminal device.
  12. 根据权利要求11所述的服务器,所述服务器进一步用于执行:The server according to claim 11, the server is further configured to perform:
    在查找主终端设备的步骤后,向所述主终端设备发送问询指令,以使所述主终端设备播放询问语音,所述问询指令为多轮免唤醒语音交互指令;After the step of finding the master terminal device, sending an inquiry instruction to the master terminal device, so that the master terminal device plays an inquiry voice, the inquiry instruction is multiple rounds of wake-up-free voice interaction instructions;
    接收用户通过所述主终端设备输入的确认语音指令;receiving a confirmation voice command input by the user through the main terminal device;
    从所述确认语音指令中提取第二终端设备识别信息;extracting the identification information of the second terminal device from the confirmation voice instruction;
    根据所述第二终端设备识别信息在能够实现所述业务需求信息的多个终端设备中筛选所述第二终端设备。Screening the second terminal device from a plurality of terminal devices capable of realizing the service requirement information according to the identification information of the second terminal device.
  13. 根据权利要求7所述的服务器,所述服务器进一步用于执行:The server according to claim 7, the server is further configured to perform:
    从所述语音指令中解析第二终端设备的识别信息;parsing the identification information of the second terminal device from the voice instruction;
    如果所述语音指令中包括所述第二终端设备的识别信息,根据所述语音指令生成控制 命令和反馈语音信息;If the voice command includes the identification information of the second terminal device, generating a control command and feedback voice information according to the voice command;
    按照所述第二终端设备的识别信息将所述控制命令发送给所述第二终端设备,以及将所述反馈语音信息发送给第一终端设备。Sending the control command to the second terminal device according to the identification information of the second terminal device, and sending the feedback voice information to the first terminal device.
  14. 根据权利要求7所述的服务器,所述服务器进一步用于执行:The server according to claim 7, the server is further configured to perform:
    向所述第二终端设备发送响应指令的步骤后,接收所述第二终端设备上报的执行结果数据,所述执行结果数据中包括运行所述响应指令后的设备新状态;After the step of sending a response command to the second terminal device, receiving the execution result data reported by the second terminal device, the execution result data includes the new state of the device after running the response command;
    从所述执行结果中提取所述设备新状态;extracting the new state of the device from the execution result;
    使用所述设备新状态更新所述存储模块中存储的设备状态。updating the device state stored in the storage module with the new device state.
  15. 根据权利要求1所述的服务器,所述服务器被配置为:The server according to claim 1, said server being configured to:
    接收第一终端设备发送的包含用户标识的语音指令,receiving a voice instruction including a user identification sent by the first terminal device,
    查找与所述用户标识相关的所有终端设备;Find all terminal devices related to the user identification;
    在不存在与所述用户标识相关的终端设备时,反馈表征不存在终端设备的参数,以使所述第一终端设备播报不存在执行语音指令的终端设备;When there is no terminal device related to the user identifier, feeding back a parameter representing the absence of a terminal device, so that the first terminal device broadcasts that there is no terminal device executing a voice command;
    在存在与所述用户标识相关的终端设备时,利用预设的过滤规则筛选出最匹配的第二终端设备,反馈表征最匹配的第二终端设备的参数,以使所述第一终端设备播报存在执行语音指令的第二终端设备,并控制最匹配的第二终端设备执行所述语音指令。When there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching second terminal device, and feed back parameters characterizing the best matching second terminal device, so that the first terminal device broadcasts There is a second terminal device that executes the voice command, and the most matching second terminal device is controlled to execute the voice command.
  16. 根据权利要求15所述的服务器,所述过滤规则表征所述语音指令与所述终端设备之间的映射关系,所述过滤规则包含第一组规则和第二组规则,或者只包含第一组规则不包含第二组规则,其中,所述第一组规则指的是为了筛选出最匹配的第二终端设备所必需的规则,所述第二组规则指的是在没有筛选出最匹配的第二终端设备时逐一叠加利用的规则。According to the server according to claim 15, the filtering rule characterizes the mapping relationship between the voice command and the terminal device, and the filtering rule includes the first group of rules and the second group of rules, or only includes the first group of rules The rules do not include the second set of rules, wherein the first set of rules refers to the rules necessary to filter out the most matching second terminal device, and the second set of rules refers to the rules that are necessary to filter out the best matching second terminal device. The rules for superimposing and utilizing the second terminal device one by one.
  17. 根据权利要求16所述的服务器,所述第一组规则包含终端设备功能权限子规则,在所述在存在与所述用户标识相关的终端设备时,利用预设的过滤规则筛选出最匹配的第二终端设备的步骤中,所述服务器还被配置为:According to the server according to claim 16, the first group of rules includes terminal device function authority sub-rules, and when there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching In the step of the second terminal device, the server is further configured to:
    利用所述第一组规则筛选当前终端设备与所述用户标识相关的终端设备;Using the first set of rules to screen the current terminal device and the terminal device related to the user identifier;
    若与所述用户标识相关的终端设备不具备执行所述语音指令权限时,则反馈表征不存在第二终端设备的参数,以使所述第一终端设备播报不存在执行语音指令的第二终端设备;If the terminal device associated with the user identifier does not have the authority to execute the voice command, then feed back a parameter representing the absence of a second terminal device, so that the first terminal device broadcasts that there is no second terminal that executes the voice command equipment;
    若与所述用户标识相关的终端设备具备执行所述语音指令权限时,则确认具备执行所述语音指令权限的第二终端设备的数量。If the terminal device related to the user identifier has the right to execute the voice command, confirm the number of second terminal devices that have the right to execute the voice command.
  18. 根据权利要求17所述的服务器,所述第二组规则包含多个子规则,多个所述子规则存在优先级,在所述确认具备执行所述语音指令权限的第二终端设备的数量之后,所述服务器还被配置为:According to the server according to claim 17, the second group of rules includes a plurality of sub-rules, and the plurality of sub-rules have priorities, and after confirming the number of second terminal devices having the authority to execute the voice command, The server is also configured to:
    在存在一个具备执行所述语音指令权限的第二终端设备时,则反馈表征存在相应权限的第二终端设备的参数,以使所述第一终端设备播报具备相应权限的第二终端设备,并控制当前第二终端设备执行所述语音指令;When there is a second terminal device with the authority to execute the voice command, feed back parameters representing the second terminal device with the corresponding authority, so that the first terminal device broadcasts the second terminal device with the corresponding authority, and controlling the current second terminal device to execute the voice instruction;
    在存在多个具备执行所述语音指令权限的第二终端设备时,则按照优先级由高到低的顺序逐一利用所述第二组规则中的子规则筛选多个具备执行所述语音指令权限的第二终端设备,直至筛选出一个最匹配的第二终端设备。When there are multiple second terminal devices with the authority to execute the voice command, use the sub-rules in the second group of rules to screen multiple terminal devices with the authority to execute the voice command in order of priority from high to low. the second terminal device until a best-matching second terminal device is selected.
  19. 根据权利要求16所述的服务器,所述第二组规则包括用户使用频次子规则、与第一终端设备的距离子规则以及终端设备优先级子规则的至少一个。According to the server according to claim 16, the second group of rules includes at least one of the user frequency sub-rule, the distance from the first terminal device sub-rule, and the terminal device priority sub-rule.
  20. 根据权利要求19所述的服务器,在利用所述用户使用频次子规则筛选第二终端设备时,所述服务器被配置为:According to the server according to claim 19, when the second terminal device is screened by using the frequency sub-rule of user usage, the server is configured to:
    分别检测多个具备执行所述语音指令权限的第二终端设备的执行频次,其中,所述执行频次指的是第二终端设备在历史行为中所执行过相同功能的语音指令的次数;Detecting respectively the execution frequency of multiple second terminal devices having the authority to execute the voice command, wherein the execution frequency refers to the number of times the second terminal device has executed voice commands of the same function in historical behaviors;
    保留执行频次最多的第二终端设备。The second terminal device with the highest execution frequency is retained.
  21. 根据权利要求19所述的服务器,在利用所述与第一终端设备的距离子规则筛选第二终端设备时,所述服务器被配置为:According to the server according to claim 19, when using the distance sub-rule from the first terminal device to screen the second terminal device, the server is configured to:
    分别检测多个具备执行所述语音指令权限的第二终端设备与第一终端设备的距离;Respectively detecting the distance between multiple second terminal devices having the authority to execute the voice command and the first terminal device;
    保留与第一终端设备的距离最近的第二终端设备。The second terminal device with the closest distance to the first terminal device is reserved.
  22. 根据权利要求19所述的服务器,在利用所述终端设备优先级筛选第二终端设备时,所述服务器被配置为:According to the server according to claim 19, when using the terminal device priority to screen the second terminal device, the server is configured to:
    按照用户设置的终端设备优先级,保留优先级最高的第二终端设备。According to the terminal device priority set by the user, the second terminal device with the highest priority is reserved.
  23. 根据权利要求15所述的服务器,在所述接收第一终端设备发送的语音指令之后,所述服务器还被配置为:According to the server according to claim 15, after said receiving the voice instruction sent by the first terminal device, said server is further configured to:
    将与所述用户标识相关的所有终端设备及设备属性从数据库加载至缓存中,以使所述服务器在所述缓存中查找第二终端设备。All terminal devices and device attributes related to the user identifier are loaded from the database into the cache, so that the server searches for the second terminal device in the cache.
  24. 根据权利要求1所述的服务器,所述服务器用于执行:The server according to claim 1, configured to execute:
    对所述语音指令进行识别,得到对应的控制信息,所述控制信息包括功能类别和控制指令;Recognizing the voice command to obtain corresponding control information, where the control information includes a function category and a control command;
    根据预先建立的终端设备信息表,确定与所述功能类别对应的第一候选终端设备集合;determining a first set of candidate terminal devices corresponding to the function category according to a pre-established terminal device information table;
    基于所述第一候选终端设备集合中各候选终端设备对应的功能状态,确定与所述控制指令匹配的第二候选终端设备集合;Determine a second set of candidate terminal devices matching the control instruction based on the functional state corresponding to each candidate terminal device in the first set of candidate terminal devices;
    从所述第二候选终端设备集合中确定与所述控制指令匹配的第二终端设备。Determining a second terminal device that matches the control instruction from the second candidate terminal device set.
  25. 根据权利要求24所述的服务器,所述服务器具体用于执行:The server according to claim 24, the server is specifically configured to perform:
    确定与所述控制指令匹配的第二候选终端设备集合中包含的所有第二候选终端设备的数量;determining the number of all second candidate terminal devices included in the second candidate terminal device set matching the control instruction;
    根据所述数量与预设阈值的大小关系,从所述第二候选终端设备集合中确定所述第二终端设备,并控制所述第二终端设备执行所述控制指令。Determine the second terminal device from the second candidate terminal device set according to the magnitude relationship between the number and a preset threshold, and control the second terminal device to execute the control instruction.
  26. 根据权利要求25所述的服务器,所述服务器具体用于执行:The server according to claim 25, the server is specifically configured to perform:
    若所有第二候选终端设备的数量小于或者等于所述预设阈值,则将所述第二候选终端设备集合中包含的所有第二候选终端设备确定为所述第二终端设备;If the number of all second candidate terminal devices is less than or equal to the preset threshold, determining all second candidate terminal devices included in the second candidate terminal device set as the second terminal device;
    控制各第二终端设备分别执行所述控制指令。Each second terminal device is controlled to respectively execute the control instruction.
  27. 根据权利要求26所述的服务器,所述服务器还用于执行:The server according to claim 26, further configured to perform:
    若所有第二候选终端设备的数量大于所述预设阈值,则发送第一提示信息,其中,所述第一提示信息用于指示用户从所述第二候选终端设备集合中确定所述第二终端设备;If the number of all second candidate terminal devices is greater than the preset threshold, send first prompt information, where the first prompt information is used to instruct the user to determine the second candidate terminal device from the set of second candidate terminal devices. Terminal Equipment;
    接收第一应答信息,其中,所述第一应答信息中包含所述第二终端设备对应的第一标识信息;receiving first response information, where the first response information includes first identification information corresponding to the second terminal device;
    控制所述第一标识信息对应的第二终端设备执行所述控制指令。controlling the second terminal device corresponding to the first identification information to execute the control instruction.
  28. 根据权利要求24所述的服务器,所述服务器具体用于执行:The server according to claim 24, the server is specifically configured to perform:
    通过语音识别方法对所述语音指令进行文本识别,得到所述语音指令对应的文本信息;performing text recognition on the voice command by a voice recognition method to obtain text information corresponding to the voice command;
    通过语义理解方法对所述文本信息进行语义理解,得到所述文本信息中包含的控制信息。The text information is semantically understood by means of a semantic understanding method to obtain the control information contained in the text information.
  29. 根据权利要求24所述的服务器,所述服务器具体用于执行:The server according to claim 24, the server is specifically configured to perform:
    获取在预设场景中包含的各终端设备分别对应的设备名称、功能名称、功能类别以及功能状态;Obtain the device name, function name, function category and function status corresponding to each terminal device included in the preset scene;
    根据所有的设备名称、功能名称、功能类别以及功能状态,建立或更新对应的终端设备信息表。According to all the device names, function names, function categories and function states, create or update the corresponding terminal device information table.
  30. 根据权利要求24-29任一项所述的服务器,所述服务器还用于执行:According to the server according to any one of claims 24-29, the server is further configured to perform:
    若所述第一候选终端设备集合为空集合,或者若所述第一候选终端设备集合为非空集合且所述第二候选终端设备集合为空集合,则发送第二提示信息,其中,所述第二提示信息用于指示用户从多个终端设备中确定所述第二终端设备;If the first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, send second prompt information, wherein the The second prompt information is used to instruct the user to determine the second terminal device from multiple terminal devices;
    接收第二应答信息,其中,所述第二应答信息中包含所述第二终端设备对应的第二标识信息;receiving second response information, where the second response information includes second identification information corresponding to the second terminal device;
    控制所述第二标识信息对应的第二终端设备执行所述控制指令。controlling the second terminal device corresponding to the second identification information to execute the control instruction.
  31. 一种进行语音控制的终端设备,包括:A terminal device for voice control, comprising:
    声音采集器,被配置为采集用户输入的语音信号;A sound collector configured to collect a voice signal input by a user;
    控制器,被配置为:Controller, configured as:
    从所述声音采集器接收用户输入的语音信号,将所述语音信号发送至服务器,以及从所述服务器接收语音指令,其中,所述语音指令为根据所述语音信号生成的;receiving a voice signal input by a user from the sound collector, sending the voice signal to a server, and receiving a voice instruction from the server, wherein the voice instruction is generated according to the voice signal;
    在所述终端设备可执行所述语音指令对应的操作时,响应于所述语音指令,执行所述语音指令对应的操作;When the terminal device can perform the operation corresponding to the voice command, in response to the voice command, perform the operation corresponding to the voice command;
    在所述终端设备不可执行所述语音指令对应的操作时,生成指令分发请求,以及将所述指令分发请求发送至所述服务器,以使所述服务器根据所述指令分发请求,查找可执行所述语音指令对应操作的其他终端设备,将所述语音指令发送至其他终端设备。When the terminal device cannot execute the operation corresponding to the voice instruction, generate an instruction distribution request, and send the instruction distribution request to the server, so that the server can search for executable commands according to the instruction distribution request. other terminal devices that are operated correspondingly to the voice commands, and send the voice commands to other terminal devices.
  32. 根据权利要求31所述的终端设备,所述终端设备配置有本机能力属性参数,所述终端设备确定是否可执行所述语音指令对应的操作的具体步骤为:According to the terminal device according to claim 31, the terminal device is configured with local capability attribute parameters, and the specific steps for the terminal device to determine whether the operation corresponding to the voice command can be performed are:
    从所述语音指令中解析待处理能力属性参数;Analyzing the to-be-processed capability attribute parameter from the voice command;
    在所述本机能力属性参数与所述待处理能力属性参数匹配时,所述终端设备可执行所述语音指令对应的操作;When the local capability attribute parameter matches the to-be-processed capability attribute parameter, the terminal device may execute the operation corresponding to the voice command;
    在所述本机能力属性参数与所述待处理能力属性参数不匹配时,所述终端设备不可执行所述语音指令对应的操作。When the local capability attribute parameter does not match the pending capability attribute parameter, the terminal device cannot execute the operation corresponding to the voice command.
  33. 根据权利要求31所述的终端设备,所述控制器,被进一步配置为:The terminal device according to claim 31, the controller is further configured to:
    接收所述服务器下发的响应指令或静默指令;receiving a response command or a silent command issued by the server;
    运行所述响应指令或静默指令。Run the responding command or the silent command.
PCT/CN2022/100547 2021-06-22 2022-06-22 Terminal device and server for voice control WO2022268136A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280038248.XA CN117882130A (en) 2021-06-22 2022-06-22 Terminal equipment and server for voice control

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN202110688867.0A CN113450792A (en) 2021-06-22 2021-06-22 Voice control method of terminal equipment, terminal equipment and server
CN202110688867.0 2021-06-22
CN202110917713.4A CN115910050A (en) 2021-08-11 2021-08-11 Server and voice control method
CN202110917713.4 2021-08-11
CN202111521226.2 2021-12-13
CN202111521226.2A CN114172757A (en) 2021-12-13 2021-12-13 Server, intelligent home system and multi-device voice awakening method
CN202210151526.4 2022-02-18
CN202210151526.4A CN114609920A (en) 2022-02-18 2022-02-18 Intelligent household control method and device, computer equipment and medium

Publications (1)

Publication Number Publication Date
WO2022268136A1 true WO2022268136A1 (en) 2022-12-29

Family

ID=84544127

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100547 WO2022268136A1 (en) 2021-06-22 2022-06-22 Terminal device and server for voice control

Country Status (2)

Country Link
CN (1) CN117882130A (en)
WO (1) WO2022268136A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009748A (en) * 2023-03-28 2023-04-25 深圳市人马互动科技有限公司 Picture information interaction method and device in children interaction story

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818782A (en) * 2016-09-12 2018-03-20 上海声瀚信息科技有限公司 A kind of method and system for realizing household electrical appliance intelligent control
CN108766432A (en) * 2018-07-02 2018-11-06 珠海格力电器股份有限公司 A kind of method to cooperate between control household electrical appliances
CN109474843A (en) * 2017-09-08 2019-03-15 腾讯科技(深圳)有限公司 The method of speech control terminal, client, server
US20190304466A1 (en) * 2018-03-30 2019-10-03 Boe Technology Group Co., Ltd. Voice control method, voice control device and computer readable storage medium
US20190318736A1 (en) * 2018-04-11 2019-10-17 Baidu Online Network Technology (Beijing) Co., Ltd Method for voice controlling, terminal device, cloud server and system
CN111722824A (en) * 2020-05-29 2020-09-29 北京小米松果电子有限公司 Voice control method, device and computer storage medium
CN111883129A (en) * 2020-08-03 2020-11-03 海信视像科技股份有限公司 Terminal device control method and device and terminal device
CN112017652A (en) * 2019-05-31 2020-12-01 华为技术有限公司 Interaction method and terminal equipment
CN113450792A (en) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 Voice control method of terminal equipment, terminal equipment and server

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818782A (en) * 2016-09-12 2018-03-20 上海声瀚信息科技有限公司 A kind of method and system for realizing household electrical appliance intelligent control
CN109474843A (en) * 2017-09-08 2019-03-15 腾讯科技(深圳)有限公司 The method of speech control terminal, client, server
US20190304466A1 (en) * 2018-03-30 2019-10-03 Boe Technology Group Co., Ltd. Voice control method, voice control device and computer readable storage medium
US20190318736A1 (en) * 2018-04-11 2019-10-17 Baidu Online Network Technology (Beijing) Co., Ltd Method for voice controlling, terminal device, cloud server and system
CN108766432A (en) * 2018-07-02 2018-11-06 珠海格力电器股份有限公司 A kind of method to cooperate between control household electrical appliances
CN112017652A (en) * 2019-05-31 2020-12-01 华为技术有限公司 Interaction method and terminal equipment
CN111722824A (en) * 2020-05-29 2020-09-29 北京小米松果电子有限公司 Voice control method, device and computer storage medium
CN111883129A (en) * 2020-08-03 2020-11-03 海信视像科技股份有限公司 Terminal device control method and device and terminal device
CN113450792A (en) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 Voice control method of terminal equipment, terminal equipment and server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009748A (en) * 2023-03-28 2023-04-25 深圳市人马互动科技有限公司 Picture information interaction method and device in children interaction story
CN116009748B (en) * 2023-03-28 2023-06-06 深圳市人马互动科技有限公司 Picture information interaction method and device in children interaction story

Also Published As

Publication number Publication date
CN117882130A (en) 2024-04-12

Similar Documents

Publication Publication Date Title
US10755706B2 (en) Voice-based user interface with dynamically switchable endpoints
KR102480949B1 (en) Coordinating signal processing between digital voice assistant computing devices
JP6516585B2 (en) Control device, method thereof and program
WO2019205134A1 (en) Smart home voice control method, apparatus, device and system
KR102551715B1 (en) Generating iot-based notification(s) and provisioning of command(s) to cause automatic rendering of the iot-based notification(s) by automated assistant client(s) of client device(s)
JP2018531404A (en) Proposal of history-based key phrase for voice control of home automation system
JP2018531404A6 (en) Proposal of history-based key phrase for voice control of home automation system
CN108683574A (en) A kind of apparatus control method, server and intelligent domestic system
CN114172757A (en) Server, intelligent home system and multi-device voice awakening method
CN107204903A (en) Intelligent domestic system and its control method
CN111665737A (en) Intelligent household scene control method and system
WO2022268136A1 (en) Terminal device and server for voice control
CN114067798A (en) Server, intelligent equipment and intelligent voice control method
CN111367188A (en) Smart home control method and device, electronic equipment and computer storage medium
US20200213653A1 (en) Automatic input selection
CN111817936A (en) Control method and device of intelligent household equipment, electronic equipment and storage medium
US20220376980A1 (en) Methods and systems for controlling operations of devices in an internet of things (iot) environment
CN113450792A (en) Voice control method of terminal equipment, terminal equipment and server
CN116566760B (en) Smart home equipment control method and device, storage medium and electronic equipment
WO2018023514A1 (en) Home background music control system
EP3557574A1 (en) Voice control method, server, and voice exchange system
CN113296415A (en) Intelligent household electrical appliance control method, intelligent household electrical appliance control device and system
CN116582381B (en) Intelligent device control method and device, storage medium and intelligent device
WO2023016126A1 (en) Terminal device, server, and multi-device collaboration login method
WO2023246151A9 (en) Display device and control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22827628

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE