WO2022268136A1 - Dispositif terminal et serveur pour commande vocale - Google Patents

Dispositif terminal et serveur pour commande vocale Download PDF

Info

Publication number
WO2022268136A1
WO2022268136A1 PCT/CN2022/100547 CN2022100547W WO2022268136A1 WO 2022268136 A1 WO2022268136 A1 WO 2022268136A1 CN 2022100547 W CN2022100547 W CN 2022100547W WO 2022268136 A1 WO2022268136 A1 WO 2022268136A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal device
voice
server
command
terminal
Prior art date
Application number
PCT/CN2022/100547
Other languages
English (en)
Chinese (zh)
Inventor
王冰
李含珍
张路伟
Original Assignee
海信视像科技股份有限公司
聚好看科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110688867.0A external-priority patent/CN113450792A/zh
Priority claimed from CN202110917713.4A external-priority patent/CN115910050A/zh
Priority claimed from CN202111521226.2A external-priority patent/CN114172757A/zh
Priority claimed from CN202210151526.4A external-priority patent/CN114609920A/zh
Application filed by 海信视像科技股份有限公司, 聚好看科技股份有限公司 filed Critical 海信视像科技股份有限公司
Priority to CN202280038248.XA priority Critical patent/CN117882130A/zh
Publication of WO2022268136A1 publication Critical patent/WO2022268136A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application relates to the technical field of voice interaction, in particular to a terminal device and a server for voice control.
  • voice interaction function With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. Using the voice interaction function, the user can voice control these terminal devices to perform corresponding operations, such as starting and stopping.
  • the process of the user's voice control of the terminal device is that the user inputs a voice signal, and after the terminal device collects the voice signal, it converts the voice signal into a corresponding instruction, so that the terminal performs corresponding operations according to the instruction.
  • the voice interaction functions of most terminal devices are limited by distance. Users cannot control the devices they want to control anywhere in the room. For example, the smart TV in the bedroom cannot be turned off or turned on by voice in the kitchen, and the temperature of the air conditioner in the bedroom cannot be adjusted by voice control in the living room. If the user wants to control the terminal device, he needs to move to an effective distance or increase the volume, resulting in poor user experience.
  • This embodiment provides a terminal device and a server for voice control, including: a voice collector configured to collect a voice signal input by a user; a controller configured to: receive a voice signal input by a user from the voice collector , sending the voice signal to a server, and receiving a voice command from the server, wherein the voice command is generated according to the voice signal; when the terminal device can perform an operation corresponding to the voice command, Responding to the voice instruction, performing an operation corresponding to the voice instruction; when the terminal device cannot perform the operation corresponding to the voice instruction, generating an instruction distribution request, and sending the instruction distribution request to the server,
  • the server is configured to search for other terminal devices capable of performing operations corresponding to the voice command according to the command distribution request, and send the voice command to other terminal devices.
  • FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application.
  • FIG. 2 is a schematic framework diagram of a voice control system of a terminal device provided in an embodiment of the present application
  • FIG. 3 is a schematic diagram of a scenario of a voice control system of a terminal device provided in an embodiment of the present application
  • FIG. 4 is a schematic diagram of a scene of another voice control system of a terminal device provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a scene of another voice control system of a terminal device provided by an embodiment of the present application.
  • FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application.
  • FIG. 7 is a usage scenario diagram of a voice control system provided by an embodiment of the present application.
  • FIG. 8 is a hardware configuration diagram of a terminal device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a voice interaction process provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of multiple terminal devices responding to voice interaction effects provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a multi-device voice wake-up method provided by an embodiment of the present application.
  • FIG. 12 is a schematic flow diagram of a screening process for a second terminal device provided in an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of determining a second terminal device according to the number of devices provided by an embodiment of the present application.
  • Fig. 14 is a schematic flow diagram of a marking master device provided by the embodiment of the present application.
  • FIG. 15 is a schematic diagram of a process flow for updating device status provided by an embodiment of the present application.
  • FIG. 16 is a server-side sequence flow chart of a multi-device voice wake-up method provided by an embodiment of the present application.
  • FIG. 17 is a sequence flow chart on the terminal device side of a multi-device voice wake-up method provided by an embodiment of the present application.
  • FIG. 18 is an application scenario diagram of a terminal device provided in an embodiment of the present application.
  • FIG. 19 is another application scenario diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 20 is a schematic flowchart of a voice control method provided in an embodiment of the present application.
  • FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application.
  • FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application.
  • FIG. 23A is a schematic flowchart of a voice control method provided by an embodiment of the present application.
  • FIG. 23B is a schematic diagram of the principle of a voice control method provided in the embodiment of the present application.
  • FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application.
  • FIG. 24A is a schematic flowchart of another terminal home control method provided by the embodiment of the present application.
  • FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application.
  • FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application.
  • FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application.
  • FIG. 27A is a schematic structural diagram of a local control device in an embodiment of the present application.
  • FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a voice interaction principle provided by an embodiment of the present application.
  • the smart device is used to receive input information and output processing results of the information.
  • Speech recognition service equipment is electronic equipment deployed with voice recognition services
  • semantic service equipment is electronic equipment deployed with semantic services
  • business service equipment is electronic equipment deployed with business services.
  • the electronic device here may include a server, a computer, etc.
  • the speech recognition service, semantic service (also called a semantic engine) and business service here are web services that can be deployed on the electronic device, wherein the speech recognition service is used for audio Recognized as text, the semantic service is used for semantic analysis of the text, and the business service is used to provide specific services such as the weather query service of Moji Weather, the music query service of QQ Music, etc.
  • the speech recognition service is used for audio Recognized as text
  • the semantic service is used for semantic analysis of the text
  • the business service is used to provide specific services such as the weather query service of Moji Weather, the music query service of QQ Music, etc.
  • there may be multiple entity service devices deployed with different business services in the architecture shown in FIG. 1 or one or more functional services may be integrated in one or more entity service devices.
  • the following is an example description of the process of processing the information input to the smart device based on the architecture shown in Figure 1. Taking the information input to the smart device as a query sentence input by voice as an example, the above process may include the following three processes :
  • the smart device After receiving the query sentence input by voice, the smart device can upload the audio of the query sentence to the voice recognition service device, so that the voice recognition service device can recognize the audio as text through the voice recognition service and return it to the smart device.
  • the smart device before uploading the audio of the query sentence to the speech recognition service device, the smart device may perform denoising processing on the audio of the query sentence, where the denoising processing may include steps such as removing echo and environmental noise.
  • the smart device uploads the text of the query sentence recognized by the speech recognition service to the semantic service device, so that the semantic service device can perform semantic analysis on the text through the semantic service to obtain the business field and intention of the text.
  • the semantic service device sends a query instruction to the corresponding business service device to obtain the query result given by the business service.
  • the smart device can obtain and output the query result from the semantic service device.
  • the semantic service device can also send the semantic analysis result of the query sentence to the smart device, so that the smart device can output the feedback sentence in the semantic analysis result.
  • FIG. 1 is only an example, and does not limit the protection scope of the present application. In the embodiment of the present application, other architectures may also be used to implement similar functions. For example, all or part of the three processes may be completed by a smart terminal, which will not be described in detail here.
  • the smart device shown in Figure 1 can be a display device, such as a smart TV, and the function of the voice recognition service device can be realized by the cooperation of the sound collector and the controller set on the display device, and the semantic service device and business service device The functions of can be realized by the controller of the display device, or by the server of the display device.
  • voice interaction function With the development of voice interaction technology, more and more home terminal devices have a voice interaction function. Using the voice interaction function, the user can voice control these terminal devices to perform corresponding operations, such as starting and stopping.
  • the process of the user's voice control of the terminal device is that the user inputs a voice signal, and after the terminal device collects the voice signal, it converts the voice signal into a corresponding instruction, so that the terminal performs corresponding operations according to the instruction.
  • the voice interaction functions of most terminal devices are limited by distance. Users cannot control the devices they want to control anywhere in the room. For example, the smart TV in the bedroom cannot be turned off or turned on by voice in the kitchen, and the temperature of the air conditioner in the bedroom cannot be adjusted by voice control in the living room. If the user wants to control the terminal device, he needs to move to an effective distance or increase the volume, resulting in poor user experience.
  • FIG. 2 is a schematic framework diagram of a voice control system for a terminal device provided in an embodiment of the present application.
  • the system includes at least two terminal devices 200 and a server 400 .
  • the terminal device 200 is configured to collect voice signals input by the user.
  • the terminal device 200 communicates with the server 400.
  • the server 400 is configured to receive a signal or a request sent by the terminal device 200 , and feed back corresponding instructions to the terminal device 200 .
  • the sound collector of the terminal device 200-1 collects the voice signal input by the user. Then the terminal device 200 - 1 sends the collected voice signal to the server 400 .
  • the server 400 generates voice instructions according to the voice signal. It should be noted that the server 400 uses the semantic system to convert the voice signal into a voice instruction, and the specific conversion process here is not limited in this application.
  • the server 400 feeds back the converted voice instruction to the terminal device 200-1.
  • the local execution capability module of the terminal device 200-1 judges whether the terminal device has the capability to execute the operation corresponding to the voice command. If the judging result is that the operation corresponding to the voice command is capable of being performed, the voice command is sent to the controller. In response to the voice command, the controller controls the terminal device 200-1 to perform an operation corresponding to the voice command.
  • an instruction distribution request is generated according to the voice instruction, and the instruction distribution request carries the voice instruction. Then send the instruction distribution request to the server 400 .
  • the server 400 After receiving the command distribution request, the server 400 searches for a terminal device 200 that can execute the voice command. For example, if it is found that the terminal device 200-2 can perform the operation corresponding to the voice command, the voice command is sent to the terminal device 200-2, so that the controller of the terminal device 200-2 controls the terminal device in response to the voice command 200-2 Execute the operation corresponding to the voice instruction.
  • the user inputs a voice signal "turn on the TV” near the smart speaker, the TV is in the bedroom, but the smart speaker is in the kitchen.
  • the smart speaker After receiving the voice signal “turn on the TV”, the smart speaker sends the voice signal to the server 400 .
  • the server converts the voice signal into a voice command, and feeds the voice command back to the smart speaker.
  • the smart speaker Since the smart speaker cannot perform the operation of “turning on the TV”, the smart speaker sends an instruction distribution request carrying the voice command “turn on the TV” to the server 400 .
  • the server 400 After receiving the command distribution request, the server 400 searches for a terminal device capable of executing the voice command "turn on the TV". If it is found that the terminal device capable of executing the voice command "turn on the TV” is a TV, then the voice command "turn on the TV” is sent to the TV in the bedroom. After the TV in the bedroom receives the voice command "Turn on the TV", it responds to the voice command and performs a power-on operation. In this way, the purpose of turning on the TV in the bedroom can be controlled by voice without being in the bedroom.
  • each terminal device is configured with a local execution capability filtering module, and the local execution capability filtering module is configured with local capability attribute parameters.
  • the specific steps for the terminal device to determine whether the machine is capable of performing the corresponding operation of the voice command are as follows:
  • the native capability attribute parameter is matched with the pending capability attribute parameter. If the native capability attribute parameter matches the pending capability attribute parameter, it means that the terminal device can execute the operation corresponding to the voice command. If the local capability attribute parameter does not match the pending capability attribute parameter, it means that the terminal device cannot perform the operation corresponding to the voice command.
  • the terminal device 200-1 is a display device
  • the terminal device 200-2 is an air conditioner device
  • the terminal 200-3 is a washing machine device
  • the terminal 200-4 is a refrigerator device. Then the local capability attribute parameter of the terminal device 200-1 is playing audio and video, the local capability attribute parameter of the terminal 200-2 is cooling and heating, the local capability attribute parameter of the terminal 200-3 is laundry, and the terminal 200-4
  • the Native Capability property parameter of is Cooling.
  • the terminal device 200-2 collects the voice signal and receives the voice command "heating" sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capacity to be processed is "heating”. Then, through the local capability filtering module of the terminal device 200-2, the local capability attribute parameter of the terminal device 200-2 is "heating". The native capability attribute parameter of the terminal device 200-2 can match the capability attribute parameter to be processed. It means that the terminal device 200-2 can perform the operation corresponding to the voice command "heating".
  • the native capability filtering module of the present application can also perform corresponding conversion according to the parsed capability attribute parameters to be processed. For example, if the user inputs the voice signal "heating" within the signal receiving range of the terminal device 200-2, the text "heating" will be obtained after being parsed by the local capability attribute module. At this time, the capability attribute parameter to be processed cannot completely match the local capability attribute parameter “heating” of the terminal device 200-2.
  • the local capacity filtering module can analyze the attribute parameters of the capacity to be processed, and obtain that "heating" and "heating” have the same meaning. Therefore, the pending capability attribute parameter is considered to match the native capability attribute parameter. That is, it is obtained that the terminal device 200-2 can realize the operation corresponding to the voice signal "heating”.
  • the terminal device 200-2 collects the voice signal and receives the voice command "play music” sent by the server 400. Further, it can be analyzed from the voice command that the attribute parameter of the capability to be processed is "play music”. Then, through the local capability filtering module of the terminal 200-2, the local capability attribute parameters of the terminal device 200-2 are "cooling” and “cooling”. Then the local capability attribute parameter of the terminal device 200-2 does not match the capability attribute parameter to be processed. It means that the terminal device 200-2 cannot perform the operation corresponding to the voice instruction "play music”.
  • the server searches for the second terminal device corresponding to the device name according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.
  • FIG. 3 is a schematic diagram of a voice control system for a terminal device provided in the embodiment of the present application.
  • Device 3 and other voice commands. These voice commands all only include the device name "device 3". If the device name of device 1 does not match the device name included in the voice command, device 1 cannot perform the operation corresponding to the voice command.
  • the server searches for a terminal device with a matching name based on the device name. Finally, it is found that the device name of device 3 matches the device name included in the voice command, and the server sends the voice command to device 3 . After the device 3 receives the voice command "turn on the device 3", it responds to the voice command and executes the starting operation. Alternatively, after the device 3 receives the voice command "shut down the device 3", it responds to the voice command and performs a shutdown operation.
  • the second terminal device can also use the local capability filtering module to reconfirm whether the machine can perform the operation corresponding to the voice command. If it is confirmed again that the machine can perform the operation corresponding to the voice command, the corresponding operation is performed in response to the voice command. If it is reconfirmed that the machine cannot perform the operation corresponding to the voice command, the second terminal device can feed back an error signal to the server, so that the server can search for a terminal device that can perform the corresponding operation of the voice command again.
  • the server searches for a second terminal device having the device capability according to the command distribution request, and sends the voice command to the second terminal device. so that the second terminal device performs a corresponding operation in response to the voice instruction.
  • Fig. 4 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application.
  • the user inputs "reduce the temperature” or “adjust the temperature to 20 degrees” within the acceptable range of the speaker device signal , "increase temperature", “increase wind speed” and other voice commands.
  • These voice commands only include device capabilities. If the local capability attribute parameter of the speaker device does not conform to the device capability parameter included in the above voice command, the audio device cannot perform the operation corresponding to the above voice command.
  • the speaker device sends an instruction distribution request to the server, and the server searches for a terminal device that meets the device's capability parameters according to the instruction distribution request. Among the devices shown in FIG. 4 , only the local capability attribute parameter of the air conditioner conforms to the device capability parameter. Then the server sends the voice command to the air conditioner, and the air conditioner performs corresponding operations in response to the voice command after receiving the voice command.
  • the server will send the voice command to the multiple terminal devices that meet the conditions. Multiple terminal devices perform corresponding operations in response to the voice instruction.
  • the user inputs "lower temperature" within the acceptable range of the air conditioner signal, and the local capability filtering module of the air conditioner first judges that the local unit can perform the operation corresponding to the voice command according to the local capability attribute parameters. Further, the air conditioner also sends an instruction distribution request to the server. According to the device capability parameter carried in the specified distribution request, the server searches for terminal devices other than the air conditioner that also meet the device capability parameter. That is, find the terminal device that can perform the operation corresponding to the voice command "lower temperature”. Finally, if you find the refrigerator, you can also perform the corresponding operation of the voice command "lower temperature”. The server sends the voice command "lower temperature" to the refrigerator, so that the refrigerator performs corresponding operations in response to the voice command. Through this embodiment, the user can input a voice command once and control multiple terminal devices at the same time.
  • voice commands that include both device name and device capability parameters.
  • the voice command includes the device name "air conditioner” and the device capability parameter "reduce the temperature”. Then the air conditioner judges through the local capability filtering module that the local machine can perform the operation corresponding to the voice command, and at the same time, the device name of the air conditioner matches the device name carried in the voice command. Therefore, the air conditioner no longer sends an instruction distribution request to the server, and the server no longer searches for other terminal devices.
  • the server searches for a matching second terminal according to the custom rules.
  • the voice command has a corresponding relationship with the terminal device.
  • FIG. 5 is a schematic diagram of another voice control system scenario of a terminal device provided in the embodiment of the present application.
  • Device 4 gives priority to playing audio novels, etc., that is, the instruction to play music corresponds to Device 2, the instruction to play movies and TV corresponds to Device 3, and the instruction to play audio novels corresponds to Device 4.
  • the local capability filtering module of device 1 first judges that the machine cannot perform the operation corresponding to the voice command, and then device 1 sends a command distribution command to the server.
  • the server finds that the terminal device corresponding to the voice command "play music" is device 2, and then sends the voice command to device 2.
  • the server finds that the terminal device corresponding to the voice command "play video" is device 3, and then sends the voice command to device 3.
  • the server includes a fusion capability rules database and an instruction distribution module.
  • the native capability attribute parameters of all devices are stored in the fusion capability rule database. Operators can update the device's local capability attribute parameters in the fusion capability rule database. For example, if a terminal device has been updated to have a new capability, it is necessary to increase the local capability attribute parameter of the device.
  • all devices are stored according to device name and device ID.
  • the instruction distribution module receives the instruction distribution request sent by the terminal device, and can parse the capability attribute parameter to be processed from the voice instruction carried in the instruction distribution request. Afterwards, the command distribution module searches the fusion capability rule database for the local capability attribute parameters that match the capability attribute parameters to be processed, so as to find the terminal device that can perform the corresponding operation of the voice command.
  • the native capability filtering module of the terminal device may also search for native capability attribute parameters from the fusion capability rule database.
  • the user when the user inputs vague voice commands, there may be multiple terminal devices that can perform operations corresponding to the voice commands.
  • the vague voice command may be a vague device control command, a vague media asset playback command, and the like.
  • the voice command can be directly sent to the air conditioner in the living room, so that the air conditioner in the living room can be activated.
  • both the air conditioner in the living room and the air conditioner in the bedroom can perform the corresponding operation of the voice command. Therefore other properties can be set to target more specific devices. For example, formulate time rules: turn on the air conditioner in the living room from 11:00 to 14:00, and turn on the air conditioner in the bedroom from 15:00 to 17:00.
  • the voice command is sent to the air conditioner in the living room according to the time rule, so that the air conditioner in the living room performs the start operation.
  • the voice command input by the user may include multiple matching items, for example, may include device name, device response time period, space where the device exists, device capability parameters, and the like. Different terminal devices may satisfy the matching items included in the voice command at the same time.
  • the voice command includes four matching items: the device name, the device response time period, the space where the device exists, and the device capability parameter.
  • Device 1 meets the matching item device name and time zone, and device 2 meets the matching item time range and device capability parameters.
  • the corresponding weight value can be set for each matching item.
  • the weight value of the device name is 10
  • the weight value of the time period is 5
  • the weight value of the space is 3
  • the weight value of the device capability parameter is 8. According to formula (1)
  • a i is the weight value of the matching items that each terminal device meets, and the final weight values of device 1 and device 2 are respectively obtained.
  • the total value of the weight attribute of device 1 is 15, and the total value of the weight attribute of device 2 is 11.
  • the server sends the voice command to the device 1, so that the device 1 performs corresponding operations in response to the voice command.
  • the servers in this application can be divided into semantic servers and instruction distribution servers.
  • the semantic server is used to recognize voice commands from voice signals input by users.
  • the instruction distribution server stores a fusion capability rule database, which is used to search for terminal devices that can perform operations corresponding to voice instructions according to the instruction distribution request.
  • the semantic server can be a web server, while the instruction distribution server is a local server. Since the local server has the advantage of fast response, the response speed of the entire voice control process can be improved.
  • FIG. 6 is a signaling diagram of a voice control method for a terminal device provided in an embodiment of the present application.
  • the signaling diagram is shown in FIG. 6 , so Said method comprises the following steps:
  • the microphone of the first terminal device receives a voice signal input by a user.
  • S102 The first terminal device sends the voice signal to the server.
  • S103 The server generates a voice command according to the voice signal, and feeds back the voice command to the first terminal device.
  • S104 The first terminal device judges whether the machine can perform the operation corresponding to the voice command.
  • the server After receiving the instruction distribution request, the server searches for a second terminal device that can perform the operation corresponding to the voice instruction according to the instruction distribution request, and sends the voice instruction to the second terminal device.
  • S108 The second terminal device performs a corresponding operation in response to the voice instruction.
  • the specific process for the first terminal device to determine whether the machine can perform the operation corresponding to the voice command is as follows:
  • the local capability filtering module of the first terminal device may acquire the native capability attribute parameters from the fusion capability rule database. Afterwards, the local capability attribute parameter and the pending capability attribute parameter are matched, and if they match, the first terminal device may execute the operation corresponding to the voice command. If they do not match, the first terminal device cannot perform the operation corresponding to the voice command.
  • the server searches for the second terminal device corresponding to the device name when searching for the second terminal device. For example, if the voice instruction is "turn on the speaker", the server searches for the speaker device according to the device name "speaker".
  • the server searches for the second terminal device with the device capability parameter when searching for the second terminal device. For example, if the voice command is "decrease temperature", it is recognized that the capability parameter of the device to be processed is "decrease temperature”.
  • the instruction distribution module of the server can search the fusion capability rule database for the native capability attribute parameter matching the capability parameter of the device to be processed. That is, the terminal device that can perform the operation corresponding to the voice command is found.
  • the server searches for the second terminal device corresponding to the custom rule.
  • the custom rules include: device 2 gives priority to playing music, device 3 gives priority to playing videos, and device 4 gives priority to playing audio novels, etc. If the voice command input by the user is "play music", then the device 2 corresponds to the custom rule, and the device 2 is determined as the second terminal device.
  • each matching item is set with a weight attribute value. If the server finds at least two terminal devices satisfying at least one matching item when searching according to matching items, then calculate the total weight attribute value of all matching items satisfied by these terminal devices, that is, the sum of weight values. The one with the largest weight attribute total value is determined to be the second terminal device.
  • the voice control system in the embodiment of the present application is a network system established based on a specific area network and based on a unified control service.
  • the voice control system may include a plurality of terminal devices 200 that establish communication connections with each other. Multiple terminal devices 200 can realize the communication connection relationship between the devices by accessing the same local area network. A plurality of terminal devices 200 can also directly form a point-to-point network through a unified communication protocol to realize a communication connection. For example, multiple terminal devices 200 may communicate with each other by connecting to the same wireless local area network. For another example, one terminal device 200 may also establish communication connections with other multiple terminal devices 200 through Bluetooth, infrared, cellular network, power carrier communication and other means.
  • the terminal device 200 refers to a device having a communication function, capable of receiving, sending, and executing control instructions and realizing specific functions.
  • the terminal device 200 includes, but is not limited to, a smart display device, a smart terminal, a smart home appliance, a smart gateway, a smart lighting device, a smart audio device, a game device, and the like.
  • the multiple terminal devices 200 constituting the voice control system may be of the same type or of different types. For example, as shown in FIG. 7 , in the same voice control system, smart TVs, smart speakers, smart refrigerators, multiple smart lamps, etc. may be included. These terminal devices 200 may be distributed in different locations, so as to meet usage requirements at corresponding locations.
  • the voice control system described in this application does not limit the scope of application of the solution to be protected in this application. That is, in practical applications, the server, terminal equipment and voice control method provided by this application are not limited to the application in the field of smart home, for other systems that support intelligent voice control, such as smart office systems, smart service systems, smart management The same applies to systems, industrial production systems, etc.
  • a terminal device 200 with a display function may include a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, and an audio output interface. 270. At least one of a memory, a power supply, and a user interface.
  • the controller 250 includes a CPU, a video processor, an audio processor, a graphics processor, a RAM, a ROM, and a first interface to an nth interface for input/output.
  • the display 260 includes a display screen component for presenting images, and a drive component for driving image display, for receiving image signals output from the controller, and displaying video content, image content, and menu manipulation interface. Components and user manipulation of the UI interface, etc.
  • the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.
  • the tuner-demodulator 210 receives broadcast TV signals through wired or wireless reception, and demodulates audio and video signals, such as EPG data signals, from multiple wireless or cable broadcast TV signals.
  • the external device interface 240 may include, but is not limited to, the following: high-definition multimedia interface (HDMI), analog or data high-definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • HDMI high-definition multimedia interface
  • component analog or data high-definition component input interface
  • CVBS composite video input interface
  • USB input interface USB
  • Any one or more interfaces such as RGB ports. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • the controller 250 controls the work of the smart device and responds to user operations through various software control programs stored in the memory.
  • the controller 250 controls overall operations of the terminal device 200 . For example, in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.
  • the user can input user commands through a graphical user interface (GUI) displayed on the display 260, and the user input interface receives user input commands through the graphical user interface (GUI).
  • GUI graphical user interface
  • the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.
  • the terminal device 200 also performs data communication with the server 400 .
  • the terminal device 200 may be allowed to communicate via a local area network (LAN), a wireless local area network (WLAN), and other networks.
  • the server 400 may provide various contents and interactions to the terminal device 200 .
  • the server 400 may be one cluster, or multiple clusters, and may include one or more types of server groups.
  • the terminal device 200-1 may have a built-in voice control system to support the user's intelligent voice control.
  • the intelligent voice control refers to an interactive process in which the user operates the terminal device 200-1 by inputting voice and audio data.
  • the terminal device 200-1 may include an audio input device and an audio output device.
  • the audio input device is used to collect voice and audio data input by the user, and may be a built-in or external microphone device of the terminal device 200-1.
  • the audio output device is used to emit sound to play the voice response. For example, as shown in FIG. 9, when the user inputs a wake-up word such as "Hi! Little ⁇ " through the audio input device, the terminal device 200-1 can play a voice response of "I'm here" through the audio output device to guide the user to complete the follow-up. Voice input.
  • the built-in intelligent voice system of the terminal device 200 also supports a one-language direct mode, that is, supports a "one-shot” mode.
  • a one-language direct mode that is, supports a "one-shot” mode.
  • the user can directly realize the control function through a small number of voice input. For example, in the traditional mode, if the user wants to control the terminal device 200 to play movie resources, he needs to input the voice "Hi, X" first, and then input "I want to watch a movie” after the terminal device 200 feedbacks "I'm here", then The terminal device 200 feeds back "the following movies have been found for you”.
  • the “one-shot” mode the user can directly input "Hi! X, I want to watch a movie", and the terminal device 200 will directly feed back "find the following movies for you” after receiving the voice command, reducing the number of voice interactions , improve voice interaction efficiency.
  • the user can control the linkage of multiple devices through intelligent voice.
  • the user can input a voice command "turn on the bedroom light” through the smart speaker, and the smart speaker can respond to the voice command to generate a control command for turning on the light, and then send the control command to the voice control system named "bedroom". lamps to control the turning on of the bedroom lights.
  • the smart speaker also responds to the user's voice input, that is, it plays feedback voice content such as "the bedroom light has been turned on for you".
  • the control command can be directly transmitted to the controlled device through the terminal device 200-1 that receives the user's voice and audio data, or can be transmitted to a specific intermediate device such as a router through the terminal device 200-1.
  • the relay device and then passed to the controlled device by the relay device.
  • the control instruction may also be transmitted to the controlled device through the server 400 .
  • the smart terminal 300 can first send the control command to the server 400, and the server 400 then transmits the control command to the terminal device 200, for control.
  • the server 400 can issue control instructions and related data to any terminal device 200 independently.
  • the user can control the display device to request online playback of media assets through interactive operations, and the server 400 can feed back media asset data to the display device according to the playback request.
  • the server 400 can send control instructions and related data to the voice control system in a unified manner.
  • the smart speaker can send the control command input by the user to the server 400, and the server 400 sends feedback data to the voice control system, so that the voice control system sends an opening command to the bedroom lamp.
  • the control response is fed back to the smart speaker.
  • Some terminal devices 200 in the voice control system can have a built-in complete voice control system.
  • This type of terminal device 200 can be used as the main control device, which can receive, process and respond independently, and can send voice and audio corresponding control to other terminal devices 200.
  • a complete voice control system may be built in terminal devices 200 such as a display device, a smart speaker, and a smart refrigerator, so as to receive voice and audio input by a user.
  • Part of the terminal devices 200 in the voice control system may not have a complete intelligent voice system built in, and only serve as controlled devices to receive control instructions sent by the master control device.
  • smart devices such as lamps and small household appliances can receive control instructions from the display device as the main control device to start, stop or change operating parameters.
  • the same voice control system may include multiple terminal devices supporting the voice control system.
  • a smart TV, a smart speaker, and a smart refrigerator are set in the same room, and these terminal devices 200 all have built-in complete voice control systems, which can respond to voice commands input by users.
  • the ways of actually responding to voice commands and the types of supported voice commands are different. For example, as shown in Figure 10, for the voice command "I want to watch a movie" input by the user, the smart TV can respond by displaying a list of movies and feedback the voice content of "the following movies have been found for you". Smart speakers and smart refrigerators cannot respond, so they will feedback the voice content of "I can't understand what you are saying".
  • the current voice control system includes multiple terminal devices 200 capable of supporting voice control, for the same voice command, multiple terminal devices 200 may wake up at the same time or by mistake, resulting in scene confusion , seriously affecting the user experience.
  • the user can define a response device through the application program in the smart terminal 300 according to usage habits, and freely switch between different wake-up strategies. For example, the user can manually set the smart speaker as the main response device, then the voice command input by the user can be responded by the smart speaker, and control commands are sent to other terminal devices 200 through the smart speaker, so as to realize the control of the terminal devices in the entire voice control system. Intelligent voice control.
  • the method of controlling the wake-up policy in a user-defined manner requires the user to perform multiple manual switching operations, which is not intelligent enough.
  • the current execution process of multi-device wake-up is to determine which terminal device is currently woken up by communicating with each other between the devices to be woken up. There are great risks in this execution method.
  • the wake-up process requires information interaction between each terminal device 200, it cannot be guaranteed that all terminal devices 200 will be completed within the specified time. The interaction of information between the terminal equipment 200 causes abnormal responses.
  • the wake-up delays of different types of terminal devices 200 are different, that is, the time from wake-up to response is different, so it cannot be guaranteed that different types of terminal devices 200 can be in the device information interaction time period at the same time when making wake-up decisions, and partially wake up
  • the time-extended terminal device 200 may not have received the wake-up word during device information interaction, thus missing the time for device information interaction, causing the terminal device 200 to be unable to respond to the voice, and the problem of abnormal voice control occurs.
  • the voice control system includes a server 400 and multiple terminal devices 200 .
  • the server 400 should at least include a storage module 410 , a communication module 420 and a control module 430 .
  • the storage module 410 is configured to store the device status reported by the terminal device 200 .
  • the communication module 420 is configured to establish a communication connection with a plurality of terminal devices 200 to obtain device statuses reported by the terminal devices 200 and to issue control instructions and related data to the plurality of terminal devices 200 .
  • the control module 430 is configured to execute the program steps on the side of the server 400 in the voice control method, so as to issue a response command or a silence command to different terminal devices 200 .
  • the terminal device 200 in the voice control system should at least include an audio input device, an audio output device, a communicator 220 and a controller 250 .
  • the audio input device is configured to detect voice audio data input by the user.
  • the audio output device is configured to play the spoken response.
  • the communicator 220 is configured to establish a communication connection with the server 400 , so as to report the status of the device to the server 400 and receive a response instruction or a silent instruction issued by the server 400 .
  • the controller 250 is configured as a program step executed on the terminal device 200 side in the voice control method, so as to complete the response of the intelligent voice control process.
  • the voice control method includes the following contents:
  • the terminal device 200 acquires voice and audio data input by the user.
  • the user can perform voice input in real time, and the built-in audio input device of the terminal device 200 can convert the voice signal input by the user into an electrical signal, and undergo a series of noise reduction, amplification, encoding, conversion, etc.
  • the signal processing method obtains speech and audio data.
  • the user can input voice and audio data in various ways. That is, in some embodiments, the user can input voice and audio data through the built-in audio input device of the terminal device 200 . For example, the user can input the voice "Hi! Xiao X, I want to watch a movie" through the built-in microphone device on the terminal device 200, then the microphone can convert the voice signal into an electrical signal, and transmit it to the controller 250 for subsequent processing .
  • the user may also include a specific wake-up word in the input voice command.
  • the wake-up word is a piece of speech containing specific content, such as "Hi! Xiao ⁇ ", “Xiao ⁇ xiao ⁇ ”, "Hey! ⁇ ” and so on.
  • the terminal device 200 can judge whether the voice input by the user contains a wake-up word. , and then perform subsequent processing to alleviate false triggering of the intelligent voice control process.
  • the terminal device 200 closer to the user will first detect to the user's voice audio data.
  • the terminal device 200 that responds to the voice is uncertain, that is, the terminal device 200 that responds to the voice may be a device that is closer to the user, or may be a device that is farther away from the user. remote device. For example, when a user enters the voice of "Hi! X, I want to watch a movie" in the bedroom, the smart speaker in the bedroom will first detect the voice and audio data, but the smart speaker does not have a video playback function, while the smart speaker in the living room Smart TVs have video playback capabilities.
  • the terminal device 200 will generate a voice instruction according to the voice and audio data.
  • the voice command is a control command, which has a specific command format, including control action functions, control object codes, and the like.
  • the terminal device 200 can first convert the voice and audio data into text through the voice processing module in the intelligent voice system, that is, convert the waveform data in the voice and audio data into text data.
  • the terminal device 200 can use a word segmentation tool to convert unstructured text data into structured text data. That is, the terminal device 200 can remove meaningless text content such as modal particles and auxiliary words in the text data by means of thesaurus matching, retain keywords in the text data, and separate multiple keywords according to word meanings to obtain structured text.
  • a word segmentation tool to convert unstructured text data into structured text data. That is, the terminal device 200 can remove meaningless text content such as modal particles and auxiliary words in the text data by means of thesaurus matching, retain keywords in the text data, and separate multiple keywords according to word meanings to obtain structured text.
  • the terminal device 200 may also input the structured text into the word processing model.
  • the word processing model is an artificial intelligence model based on machine learning. After the text data is input, the word processing model can calculate and determine the classification probability that the text information belongs to a specific semantic meaning. Therefore, by using various standard control instructions as classification labels, the word processing model can output the classification probability of text data for each standard control instruction, where the standard control instruction with the highest classification probability is the control instruction corresponding to the voice and audio data .
  • the word processing model can be obtained by repeatedly training the initial model by using sample data and set input and output rules.
  • the sample data is text information with labels.
  • the sample data can be used as the input and the classification probability can be used as the output to perform calculations on the sample data. And compare the output result with the label in the sample data to obtain the training error, and then backpropagate the training error, that is, adjust the model parameters according to the training error, so that after repeated input of a large number of sample data, an accurate output can be obtained Word processing model for recognition results.
  • the terminal device 200 can convert the voice and audio data input by the user into voice instructions.
  • the controlled device or the server 400 can directly process the voice command after receiving the voice command, such as executing control actions according to the voice command and extracting service requirement information from the voice command.
  • the terminal device 200 can directly send voice and audio data as a voice command, that is, for a terminal device 200 with low data processing capability or without a built-in complete voice control system, the terminal device 200 can directly send the audio data to The data is forwarded, and the server 400 or other terminal devices 200 perform language processing, so as to alleviate the computing load of the current terminal device 200 .
  • the terminal device 200 may send the voice command to the server 400 to trigger the server 400 to perform control on the wake-up process of multiple terminal devices 200 .
  • the voice control system may include multiple terminal devices 200 with built-in voice control systems, when the user inputs voice, the multiple terminal devices 200 in the voice control system can all detect voice and audio data.
  • the server 400 may suspend the voice command generation process and the voice command reporting process in other terminal devices 200 after receiving a voice command.
  • the server 400 can send a control command for suspending command generation and command transmission to the smart speakers and smart refrigerators in the voice control system where the smart TV is located, then after receiving the control command After that, both the smart speaker and the smart refrigerator stop generating and sending voice commands. Since the terminal device 200 with higher data processing capability can usually complete the voice and audio data calculation in a shorter time, it can complete the generation of the voice instruction before other devices. Therefore, after receiving the voice command sent first, the server 400 stops the voice command generation and reporting process of other terminal devices 200, which can also shorten the voice command generation time and improve the voice response speed.
  • the server 400 may analyze the service requirement information in the voice command. For different voice commands input by the user, the control content contained therein is also different, so they have different service requirements. For example, when the user inputs the voice "Hi! X, I want to listen to music", after processing by the terminal device 200, a voice command is generated, and the voice command includes the service requirement of "play music” (music_play). When the user enters the voice "Hi! X, turn on the bedroom light”, a voice command containing the business requirement of "turn on the lamp” (light_power on) is generated.
  • the server 400 can directly extract the service requirement information from the summary of the voice instruction. And when the voice command is the voice and audio data uploaded by the terminal device 200, the server 400 can also identify and process the voice and audio data uploaded by the terminal device 200, that is, the processing method performed by the terminal device 200 on the voice and audio data in the above-mentioned embodiment is the same In other words, the server 400 can also recognize the voice and audio data through built-in speech-to-text tools, text structured processing tools, and word processing models, so as to identify business requirement information therefrom.
  • a business requirement recognition model can be set, or the output classification of the above-mentioned word processing model can be set as a business requirement, so as to calculate the user voice and audio data through the model. Classification probability for each business requirement.
  • the voice content input by the user may contain multiple user intentions, multiple service requirements may also be parsed from the corresponding voice instructions. For example, if the user inputs the voice "Hi! X, turn on the light in the living room and play a movie", the two business requirements of "turn on the light” and “play a movie” can be parsed out from the voice command.
  • the voice control system can also realize richer voice interaction functions by presetting a richer instruction set, and then according to the set instruction set, the business requirements contained in it can be determined correspondingly. For example, if the user inputs the voice "Hi!
  • the voice control system can determine the control content of the voice control system according to the instruction set of "theater mode", including playing a movie and turning off the lights at the same time, so as to imitate the atmosphere of a movie theater . Therefore, the server 400 can parse out the two service requirements of "turn off the lamp” and "play a movie” from the voice command.
  • Different service requirements correspond to different control operations performed by the terminal device 200, and correspond to different device states for the terminal device 200 that needs to respond to the voice command.
  • a lamp For example, for a lamp, it can only support on/off, brightness adjustment and other controls when it is in the standby state; when the user turns off the power supply of the lamp through the wall switch and makes it offline, it cannot support on/off, brightness adjustment, etc. control.
  • the terminal device 200 can report the device status to the server 400 through a predetermined information reporting strategy.
  • the terminal device 200 may report the current device status to the server 400 every specific time according to the data update frequency, and the server 400 may update the stored device status according to the reported status of the terminal device 200 .
  • the server 400 may send a heartbeat command to the terminal device 200, and the terminal device 200 may feed back the current device status to the server 400 after receiving the heartbeat command, so that the server 400 may update the stored device status. And when the server 400 sends a heartbeat command to the terminal device 200 within a preset period, and the terminal device 200 does not feed back a heartbeat command to the server 400, the server 400 may update the corresponding device status to an offline status.
  • the device state of the terminal device 200 may also be triggered to be reported through a voice command. That is, the server 400 may acquire the voice and audio data corresponding to the voice command, and recognize the wake-up word from the voice and audio data. If the voice and audio data includes the wake-up word, locate the voice control system where the terminal device 200 is located, so as to send a status acquisition request to the voice control system. All terminal devices 200 in the voice control system may report the device status after receiving the status acquisition instruction.
  • the server 400 can recognize the wake-up word "Hi! Xiao ⁇ ” from the voice and audio data, then after recognizing the wake-up word "Hi! Xiao ⁇ ” in the voice and audio data, the server 400
  • the voice control system currently used by the user can be determined according to the identification information of the terminal device 200, that is, " ⁇ 's home system".
  • the voice control system has a smart TV, speaker A, and speaker B in the living room; there are lights and speaker C in the bedroom. ; There is a smart refrigerator in the kitchen. Then send a status acquisition request to the voice control system, so that the TV, speaker A, speaker B, lamp, speaker C, and smart refrigerator in the voice control system report the current device status.
  • the server 400 may screen the second terminal device according to the service requirement information and the device status information.
  • the second terminal device is an intelligent device whose device status can realize service requirement information.
  • the server 400 can filter the current voice control system according to different preconditions in the process of screening the second terminal device.
  • the terminal device 200 performs multi-level screening. For example, if the user enters the voice "Hi! Little ⁇ , turn on the light", the corresponding business requirement is "turn on the light”.
  • the preconditions required to realize the business requirement are: the device type is a light fixture, and the device status is standby , the server 400 can first filter out all terminal devices 200 whose type is a lamp in the current voice control system according to the device type, and then filter out lamps whose device status is in a standby state according to the device status as the second terminal device.
  • the server 400 After the second terminal device is screened out, the server 400 sends a response command to the terminal device 200 as the second terminal device, and the terminal device 200 as the second terminal device can respond to the voice control function by running the response command. At the same time, the server 400 also sends a silent command to other terminal devices 200 other than the second terminal device in the current voice control system, so that other smart devices in the current voice control system other than the second terminal device can run the silent command without responding to the voice. control function.
  • the terminal device 200 supporting voice interaction in the home environment will report the received voice command and device status to the server 400, that is, the voice command "Turn on the light " and the device status (standby) are reported to the server 400.
  • the server 400 can determine that the current device status of a lamp meets the object category corresponding to the service requirement in the current user voice command. Therefore, the server 400 can issue a response command for waking up the lamp, and at the same time issue a silent command to other devices, so that the device side executes the corresponding command, so that the lamps that meet the business requirements and device status are turned on, but do not meet the business requirements. and other terminal devices 200 in the device state remain silent.
  • the voice control method provided in the above embodiment can use the service requirement information contained in the voice command and the device status reported by the terminal device 200 to screen out the second terminal capable of responding to the voice command in the current voice control system equipment. And send a response command to the second terminal device, and send a silent command to other devices at the same time, so that after the intelligent voice system receives the voice command input by the user, each terminal device 200 will exchange information through communication with the server 400 respectively,
  • the second terminal device is automatically judged by the server 400 to reduce data interaction between multiple terminal devices 200, so as to alleviate the problem of low execution rate caused by frequent communication between multiple devices.
  • the device When the user inputs the voice, the device can be explicitly executed in the voice instruction. For example, if the voice content input by the user is "turn on the TV", the executing device is specified as the TV. At this time, since there is a clear executing device, the server 400 can The voice command is directly transmitted to the TV device, and the execution device can be determined without parsing the business requirement information. Therefore, in some embodiments, after the server 400 receives the voice command reported by the terminal device 200-1, it may also detect the executing device in the voice command. If there is no specific execution device in the voice instruction, the second terminal device is screened by parsing the service requirement information and matching with the device status according to the manner provided in the above-mentioned embodiment.
  • the control command and feedback voice information can be generated according to the voice instruction.
  • the control command is a command corresponding to the voice command and oriented to the execution device.
  • the corresponding generated control command is "TV_power on”.
  • Feedback voice information is a kind of voice audio sent out for the voice content, which is used to remind the user of the execution result of the instruction. For example, when the user enters the voice "Turn on the TV", the intelligent voice system will play the feedback voice information of "Turn on the TV for you" after turning on the TV.
  • the control command and the feedback voice information can be sent to the specific terminal device 200, so as to implement the service corresponding to the control command by executing the control command, and prompt the user of the service execution result by playing the feedback voice information.
  • Both the control command and the feedback voice information can act on the execution device. For example, when the user enters the voice of "turn on the TV", the TV responds to the voice to power on and start up, and at the same time, "I have been asked to turn on the TV” is played through the TV's intelligent voice system and speakers. ” voice feedback.
  • the server 400 may send control commands and feedback voice information to different terminal devices 200 respectively. That is, the server 400 may send the control command to the execution device according to the identification information of the execution device, and send the feedback voice information to the smart device that inputs the voice command.
  • the intelligent air conditioner with the intelligent voice system in the bedroom first detects the voice audio data, and generates a voice command and sends it to the server 400, and the server 400 can determine that the TV is turned on according to the voice command. device, and generate the "TV_power on” control command and the feedback voice information of "Turn on the TV for you” according to the voice command. Then send the control command to the TV in the living room to turn on the TV, and send the feedback voice information to the smart air conditioner to play the voice feedback of "Turn on the TV for you” through the smart air conditioner in the bedroom.
  • the voice command when there is a clear execution device in the voice command, the voice command can be responded to by the execution device and the terminal device 200 that inputs the voice command, so as to meet the business needs and give the user better feedback effect.
  • the voice control system may contain multiple terminal devices 200, and different terminal devices 200 can support the same business needs and be in the same device state at the same time, the devices are screened through the methods in the above-mentioned embodiments , multiple second terminal devices may be screened out.
  • the server 400 directly sends a response command to the terminal device 200 as the second terminal device, it will cause multiple terminal devices 200 to respond to a voice command at the same time, and there is still the problem of scene confusion.
  • the server 400 may further perform a detailed screening process by adding screening conditions, so as to reduce the number of terminal devices 200 serving as the second terminal devices. That is, in some embodiments, the service requirement information may further include service type and service status. Then, when the server 400 screens the second terminal device according to the service requirement information, it may extract the service type and service state from the service requirement information, and match the candidate device meeting the service type in the device state, wherein the candidate device has a service The device type required by the type, and then by traversing the device states of the candidate devices, to filter out the second terminal device whose device state conforms to the service state.
  • the terminal device 200 in the home environment will report the received voice command, the current device type and device status to the cloud server 400, that is, the device type (music) and Device status (playing).
  • the server 400 can filter the corresponding device type and device status in the current voice control system according to the service type and service status required in the voice command, and determine that there is a speaker that is playing music.
  • the current device type and device status conform to the object category of the current user's voice command. Therefore, the server 400 can send a response command to the corresponding speaker, and send a silence command to other devices at the same time, so that the speaker device in the current voice control system executes the corresponding response command and performs the operation of turning off the music.
  • the service requirement information further includes a service execution location
  • the server 400 may further screen the terminal device 200 according to the service execution location to determine the second terminal device. That is, the server 400 can extract the service execution location from the service requirement information when screening the second terminal device according to the service requirement information, and obtain the device locations of each candidate device in the current voice control system; if the device location of the candidate device is consistent with the service execution location coincidence, that is, the candidate device satisfies the service execution position, then the step of traversing the device states of the candidate devices may be performed to screen out the second terminal device whose device state meets the service state. If the device location of the candidate device does not coincide with the service execution location, it is marked that the candidate device is not the second terminal device, that is, the device can be deleted from the candidate device list.
  • the terminal device 200 in the home environment will receive the user instruction and report the current device type and device status to the cloud server, that is, the device type (none) and device status (standby).
  • the server 400 can analyze the service execution location "bedroom” from the voice command, and screen the terminal device 200 in the current voice control system according to the service execution location, and determine that the device location is in the bedroom The terminal device 200 within the range. Therefore, the server 400 may issue a response instruction to the speakers in the bedroom when it is determined that a speaker in the bedroom corresponds to the current device type and the device status meets the object category controlled by the current user instruction.
  • the server 400 also sends a silent command to other devices in the current voice control system, including devices in the bedroom and devices outside the bedroom.
  • the voice control method provided in the above embodiment can perform multiple rounds of screening on the terminal devices 200 in the voice control system based on service requirement information such as service type, service status, and service execution location, so as to determine a small number of terminal devices 200.
  • the second terminal device is used to reduce the communication frequency between the terminal devices 200 and improve the execution efficiency of the intelligent voice control process.
  • the server 400 can screen out the second terminal device that can respond to the control instruction from among the many terminal devices 200 .
  • the number of terminal devices 200 as the second terminal devices can be greatly reduced, there are still many terminal devices 200 that can meet the service requirements in the partial screening process, and for the user's voice control process, usually only Specific one or more second terminal devices are required to perform the response.
  • the server 400 may further determine the final execution device from among the screened multiple terminal devices 200 that can meet service requirements. That is, when screening the second terminal device according to the service requirement information, the server 400 may perform the following steps:
  • S202 If the number of terminal devices is equal to 1, that is, there is only one terminal device 200 that can meet the current service demand in the current voice control system, so the server 400 can directly mark the terminal device 200 that can realize the service demand information as the second terminal device.
  • S203 If the number of terminal devices is greater than or equal to 2, search for a master device.
  • the master device is one of multiple terminal devices capable of realizing service requirement information.
  • the main device may perform further interaction with the user to determine the second terminal device that finally responds to the voice instruction.
  • the server 400 may send an inquiry instruction to the master device, so that the master device plays an inquiry voice, wherein the inquiry instruction is multiple rounds of wake-up-free voice interaction instructions. Then receive the confirmation voice command input by the user through the main device, and extract the identification information of the second terminal device from the confirmation voice command, so as to screen the second terminal among multiple smart devices that can realize business demand information according to the identification information of the second terminal device equipment.
  • the server 400 can filter out the terminal device 200 that meets the service requirement according to the service requirement in the voice command.
  • the server 400 may send a response instruction to the speaker A, and send a silence instruction to other terminal devices 200 including the speaker B.
  • the main device may be the terminal device 200 closest to the location of the sound source corresponding to the voice command.
  • the server 400 searches for the main device, it can obtain the voice and audio data of multiple smart devices that can realize the business demand information for voice command detection, and extract the sound energy value from the voice and audio data, and then compare the sound energy value to obtain the sound energy value.
  • the terminal device 200 with the highest energy value thus marking the terminal device 200 with the highest sound energy value as the master device.
  • the reverberation time parameter T60 in a specific scene is determined, that is, the time required for the energy attenuation of 60db at any position is the same, and T60 can be estimated based on the energy ratio of the direct sound and the reverberation sound at the corresponding position, therefore, it can be based on From the beamformed spectrogram and the arrival time difference of the sound source, the energy ratio of the direct sound and the reverberant sound of all terminal devices 200 in the environment to the sound source is calculated, and then the direct energy is calculated. Then, by arranging the direct sound energy of the sound source received by each device, it can be determined that the terminal device 200 closest to the position of the sound source is the master device.
  • the master device may also be determined based on other methods. That is, in some embodiments, the detection process of the distance between the sound source location and the terminal device 200 can also be completed by each terminal device 200, that is, the terminal device 200 can acquire images of the current environment through multi-eye cameras, and according to multiple A three-dimensional space model is constructed from images from different angles, and then a portrait is extracted from the three-dimensional space model according to the image recognition method, so as to locate the position of the user in the three-dimensional space model, that is, the position of the sound source.
  • the terminal device 200 After locating the position of the sound source, the terminal device 200 determines the distance between the position of the sound source and each terminal device 200 according to the placement status of the current smart home model, and finally sends the calculated distance to the server 400, so that the server 400 can It is determined that the terminal device 200 closest to the sound source is the master device.
  • the server 400 can further select among multiple terminal devices 200 that can finally execute The second terminal device that responds to the voice control, so that before the voice control process, there will be no frequent communication between multiple devices, and the response speed of the voice interaction process will be improved.
  • the server 400 can determine the second terminal device, and by issuing a response instruction, the second terminal device can make an interactive response to the voice input by the user. Since the interactive response process can control the second terminal device to perform specific interactive actions, these interactive actions may change the device status of the terminal device 200, so after sending the response command to the second terminal device, the server 400 can also obtain the second terminal device. The device state of the terminal device after executing the response command, so as to update the stored device state in real time.
  • the server 400 may receive the execution result data reported by the second terminal device after sending a response instruction to the second terminal device.
  • the execution result data includes the new state of the device after running the response instruction. Then extract the new state of the device from the execution result, and use the new state of the device to update the state of the device stored in the storage module.
  • the device state stored in the server 400 can be kept consistent with the actual device state of the terminal device 200 in the voice control system in time, so that the server 400, in the subsequent execution of the intelligent voice interaction process,
  • the terminal device 200 can be screened based on the updated device state, so as to more accurately determine the second terminal device.
  • a server 400 including: a storage module 410 , a communication module 420 and a control module 430 .
  • the control module 430 is configured to perform the following program steps:
  • S303 Screen the second terminal device according to the service requirement information, where the second terminal device is an intelligent device whose device status can realize the service requirement information.
  • S305 Send a silent command to other smart devices other than the second terminal device in the current voice control system.
  • a terminal device 200 is also provided in some embodiments of the present application, including: an audio input device, an audio output device, a communicator 220 and a controller 250 .
  • the controller 250 is configured to perform the following program steps:
  • S401 Acquire voice and audio data input by a user for performing voice control.
  • S402 Generate a voice instruction according to the voice audio data.
  • S403 Send a voice instruction to the server, so that the server parses the service requirement information in the voice instruction, and screens a second terminal device according to the service requirement information, and the second terminal device is an intelligent device whose device status can realize the service requirement information.
  • S404 Receive a response instruction or a silent instruction sent by the server.
  • the server 400 and the terminal device 200 provided in the above embodiment can form a voice control system for implementing the above voice control method.
  • the server 400 may analyze the service requirement information from the voice command after the user inputs the voice command, and screen the second terminal device whose current device status can realize the service requirement according to the service requirement information, so as to send a response command to the second terminal device , so that the smart device as the second terminal device makes a voice response; at the same time, the server 400 also sends a silent instruction to other devices in the current voice control system other than the second terminal device according to the screening result of the second terminal device, so that the The terminal device 200 that is not the second terminal device does not respond to the voice control function.
  • the server 400 can pre-process voice commands, so that all types of terminal devices 200 can quickly and efficiently make correct wake-up responses within a specified time, and solve the problem of abnormal responses in traditional voice wake-up methods.
  • FIG. 18 is an application scenario diagram of a terminal device provided by an embodiment of the present application.
  • terminal devices such as smart TV 200-5, smart air conditioner 200-2, smart refrigerator 200-4, and smart washing machine 200-3 in the home can be connected to smart terminal 300 and server 400 through the Internet of Things module.
  • data transmission can be performed through a local area network or a wide area network, so as to realize the control and management of the terminal equipment.
  • IoT modules can be built into individual end devices.
  • the first terminal device is generally a device with certain functions. For the received voice, if no terminal device is specified, it is considered that the device responds to the voice, and if the terminal device to be operated is specified, a control command is sent to the specified terminal device. Exemplarily, taking a smart speaker as an example, after receiving "fast forward”, first judge whether there is an operation corresponding to the command on the smart speaker, and if so, execute the fast forward operation; if not, then feedback an unrecognized prompt information. If "TV fast forward" is received, the smart speaker first judges whether there is an operation corresponding to the corresponding TV corresponding to the mapping relationship. If it exists, it will send an operation command that triggers the TV fast forward to the TV. If it does not exist, it will feedback the representation Unrecognized hint message.
  • the first terminal device may play a prompt to the user representing several terminal devices that have mappings.
  • the determination of the mapping relationship between the first terminal device and other terminal devices may be performed in a local module or in the cloud.
  • the local module and can be placed in the first terminal device or fixed to each other. They can also be separate objects.
  • FIG. 19 is another application scenario diagram of a terminal device provided in an embodiment of the present application.
  • the user wants to listen to XXX songs, after the user inputs the wake-up word "Hi!
  • the first terminal device activates the voice application installed in the device, and inputs "play" to the first terminal device XXX song "voice message
  • the first terminal device converts the voice message into a voice command through the voice application and transmits it to the server 400
  • the server 400 inquires about the terminal device currently configured by the user after receiving the voice command, and finds that there is a TV, After the refrigerator and air conditioner, the user will be fed back "You have 3 devices, which one do you want to use” through the first terminal device. If the user is in the living room at this time, he may continue to input the voice message of "play with the living room".
  • the server 400 will further determine that there are two terminal devices, a TV and an air conditioner, in the living room at this time, and feed back to the user through the first terminal device "You have 3 devices in the living room, which one do you want to use?" The user needs to input " Let's play it on TV", so far, the server 400 will control the TV to play XXX song.
  • the first terminal device may be a device with a radio function such as a smart remote controller and a smart speaker.
  • the server 400 needs to perform multiple rounds of voice interaction with the user, so that the user can actively make various prompts, and finally determine the second terminal device that will finally execute the command. The above process of multiple interactions with the user.
  • the present application provides a server in some embodiments, and the server is configured to perform a voice interaction process.
  • the voice interaction process will be described below with reference to the accompanying drawings.
  • the method being executed in the server 400 may be executed in another terminal device or the first terminal device.
  • Follow-up uses server execution as an example.
  • FIG. 20 is a schematic flowchart of a voice control method provided by an embodiment of the present application. As shown in Figure 20, the method includes the following steps:
  • the first terminal device and other terminal devices such as TVs, refrigerators, and air conditioners are placed in the user's actual environment. Both the first terminal device and the other terminal devices can be connected to the server through the network, and the user pre-operates on a certain terminal device. These terminal devices are all logged into the same account, and the server side stores the mapping relationship between the user ID of the account and the terminal device. At any moment, the first terminal device receives the voice input by the user.
  • the user may send a voice command to the first terminal device based on his own needs, and control the corresponding second terminal device to execute the voice command by voice. For example, when the user wants to use the second terminal device to watch the XXX movie, he can enter the wake-up word "Hi! XX" into the first terminal device, and then input a voice message of "Play XXX movie" to the first terminal device.
  • the second terminal device and the first terminal device are two independent devices, and the second terminal device can have its own voice receiving device for voice control.
  • the second terminal device and the first terminal device are not TVs and only It is not the relationship between the TV remote control used for the TV, nor the relationship between the TV processor and the voice receiving device on the TV.
  • the wake word and voice command can be entered sequentially.
  • the first terminal device can recognize the voice instruction contained in the sentence containing the wake-up word.
  • the first terminal device can integrate functions of sound collection and speech analysis.
  • the radio function is to receive the voice message sent by the user
  • the voice analysis function refers to extracting the key part of the voice message sent by the user.
  • the key part can reflect the user's intention or the content to be done, and analyze the user's intention Afterwards, it is converted into a voice command, and the voice command may be an executable command format agreed between the first terminal device and the server 400 .
  • the command format may include command (command) and parameters (parameter).
  • the voice commands are sent to the server 400 and received by the server 400 .
  • adding the corresponding user identifier to the voice command facilitates the server 400 to identify which user sent the voice command, and also facilitates the user to search for all terminal devices configured by the user.
  • the first terminal device has an independent audio-visual display system capable of decoding and playing audio and video streams.
  • the second terminal device also has an independent audio and video display system, which can decode and play audio and video streams.
  • S2001 Receive a voice instruction including a user identifier sent by a first terminal device.
  • the first terminal device may use the text converted from the voice as a voice instruction and send it to the server.
  • the voice conversion server is in the server, so the first terminal device can send the received voice to the server according to the agreed encapsulation structure, and the server decapsulates the received data and obtains the voice command.
  • the server 400 analyzes the voice command to obtain the user identifier and the user's intention conveyed by the voice command.
  • the server determines the device type corresponding to the command by analyzing the voice command and according to the keywords in the parsed text, where the device type refers to the functional authority of the terminal device.
  • mapping relationship between keywords and device types may be cached in the server, and in some embodiments, a trained keyword-device type neural network model may also be stored.
  • the server 400 searches the database for all terminal devices related to the user ID, that is, searches for all terminal devices configured by the user. This is because, if device discovery is based on near-field communication, the device can discover all the devices it scans, including its own and/or devices of other users in close proximity, but some of these devices do not belong to the user , even if it is judged by the LAN, it is possible to initially select the scenario where a guest device is connected to the LAN. By pre-establishing a relationship between the user ID and its own equipment in a server with a management function, the server 400 can determine the equipment actually owned by the user according to the user ID.
  • the terminal device refers to other devices associated with the user identifier except the first terminal device.
  • the recognition of the voice command needs to determine the type of the voice command, and then compare it with the types that can be executed by each terminal device, and then determine the terminal device that can execute the voice command.
  • the execution types of each terminal device may be pre-calibrated, for example, the kitchen refrigerator corresponds to freezing, refrigeration, recipe recommendation, and ingredient identification. It may also be determined according to the device identifier when a newly added device is scanned. For example, when the newly added device identifier indicates that the device is a refrigerator, an association between the device identifier and freezing, refrigeration, recipe recommendation, food material identification, etc. is established.
  • the server 400 may also load all terminal devices and device attributes related to the user ID from the database into the cache. Subsequently, when the server 400 searches for the second terminal device, it may directly search in the cache, so that the speed at which the server 400 searches for the second terminal device can be accelerated.
  • the device attributes include inherent attributes such as the location of the terminal device, the name of the terminal device, and the ID of the terminal device.
  • the server 400 if the server 400 does not find a terminal device related to the user identifier, it means that the user has not configured a terminal device. At this time, the server 400 needs to feed back the parameter indicating that there is no terminal device to the first terminal device. The first terminal device broadcasts the absence of the terminal device according to the received parameter representing the absence of the terminal device.
  • S2004 When there is a terminal device related to the user identifier, use preset filtering rules to filter out the most matching second terminal device, and feed back parameters characterizing the best matching second terminal device, so that the first terminal The device announces that there is a second terminal device that executes the voice command, and controls the most matching second terminal device to execute the voice command.
  • the preset filtering rule refers to a rule for filtering out a preset rule conforming to the user's intention/a device capable of executing the voice instruction. For example, when the user inputs the voice of "play folk music", the first terminal device sends an instruction to play folk music to the server, and the server recognizes that the type of the user's intention is to play music, and then determines the first terminal that can perform this type of operation according to the type. Two terminal equipment.
  • the terminal device related to the user identifier when there is a terminal device related to the user identifier, the terminal device related to the user identifier is used to execute the voice instruction.
  • the voice instruction is an instruction to play a video
  • the device that cannot perform video playback will be screened out.
  • the voice command can be executed by the cooling device.
  • the server 400 is configured with preset filtering rules.
  • the filtering rule characterizes the mapping relationship between the voice instruction and the terminal device.
  • the filtering rules include the first group of rules and the second group of rules, or only the first group of rules do not include the second group of rules, wherein the first group of rules refers to filtering out the most matching second terminal device
  • the second set of rules refer to the rules that are superimposed and utilized one by one when no best matching second terminal device is screened out.
  • the first group of rules includes the mapping relationship between device types and each terminal device, for example, the mapping relationship shown in Table 1 in the above embodiments.
  • the first set of rules may be a judgment on the number corresponding to user identifiers.
  • the second rule needs to be further used for further screening.
  • the optional second terminal device is determined according to the device type and the mapping relationship of each device, if the number of the second terminal device is 1, the voice instruction is directly sent to the second terminal device, if 0, then there is no executable device to feed back to the first terminal device. If it is more than 1, it can be based on location/installation time/usage frequency/device that executed this type of command last time/start time/signal strength/priority, etc. One or any combination of them can be used as the second set of rules for filtering.
  • FIG. 21 is a schematic flowchart of a screening terminal device provided by an embodiment of the present application.
  • the process of screening the second terminal device according to the filtering rules is as follows:
  • S2101 Use the first set of rules to screen the current terminal device and the terminal device related to the user identifier.
  • the server 400 when there is a terminal device related to the user identifier, the server 400 first screens the terminal device according to the necessary rules for screening out the most matching second terminal device.
  • the first group of rules includes terminal device function authority sub-rules, where the terminal device function authority sub-rules refer to the respective functions of each terminal device, such as a smart TV having the function of playing media assets,
  • the air conditioner has the function of cooling and heating and so on. That is, when the server 400 screens out the terminal devices actually owned by the user, it further uses the terminal device function authority sub-rule to screen the terminal devices, and excludes the terminal devices that do not have corresponding authority functions.
  • the server 400 when terminal devices are screened by using the sub-rules of terminal device function rights, the server 400 respectively detects function rights of multiple terminal devices related to the user identifier.
  • the server 400 selects the corresponding terminal device as the candidate terminal device.
  • the server 400 excludes the corresponding terminal device.
  • the server 400 when using the first set of rules to filter terminal devices related to the user identifier, if none of the terminal devices actually owned by the user has the corresponding functional authority, the server 400 needs to feed back the token to the first terminal device. There are no parameters for end devices. The first terminal device broadcasts that there is no terminal device according to the received parameter representing the absence of the terminal device.
  • the server 400 when using the first set of rules to filter terminal devices related to the user identifier, if the terminal device actually owned by the user has the corresponding functional authority, the server 400 also needs to confirm that the terminal device with the corresponding functional authority quantity.
  • the server 400 when the server 400 detects that there is only one terminal device with the authority to execute the voice command, it feeds back the parameters representing the terminal device with the corresponding authority to the first terminal device, and the first terminal device according to The parameter representing the terminal device with the corresponding authority broadcasts the terminal device with the corresponding authority.
  • the server 400 also needs to control the current terminal device to perform corresponding operations according to the user's intention in the voice command.
  • the server 400 sends the voice instruction to the current terminal device, and the current terminal device receives the voice instruction and performs a corresponding operation according to the voice instruction.
  • the user is currently only equipped with a sweeping robot. If the user inputs a voice message of "sweeping the floor", the first terminal device transmits the voice message to the server 400. After querying, the server 400 finds that the current user is only equipped with a sweeping robot. With cleaning function. Therefore, the server 400 controls the first terminal device to broadcast "the sweeping robot starts to sweep the floor", and controls the sweeping robot to perform the sweeping function. Wherein, when the server 400 controls the broadcasting of the first terminal device, it may feed back the parameter representing the terminal device with the corresponding authority to the first terminal device through a long link.
  • the sub-rules in the second set of rules are used to screen the multiple terminal devices that have the authority to execute the voice command one by one in order of priority from high to low.
  • the second set of rules preset in the server 400 includes a user frequency sub-rule, a distance from the first terminal device sub-rule, and a second terminal device priority sub-rule.
  • the user frequency sub-rule refers to the number of times the user has executed similar voice commands through the second terminal device, for example, the number of times the user plays video data such as A movie and B variety show through a smart TV, or the user plays C through a smart speaker.
  • the frequency of audio data such as songs and D songs. Among them, A movie, B variety show, C song, D song, etc. can be divided into media data.
  • the distance sub-rule from the first terminal device refers to the distance between each second terminal device and the first terminal device.
  • the second terminal device priority sub-rule refers to the priority of each terminal device set by the user when executing voice commands. For example, corresponding to the voice commands for playing media assets, the priority of smart TVs is higher than that of smart refrigerator screens. Wherein, the priority of the terminal device can be set by the user through the application program on the smart terminal 300 .
  • multiple sub-rules in the above-mentioned second group of rules can be set with corresponding priorities by the user, for example, the priority of setting the priority of the user usage frequency sub-rule is higher than the priority of the distance from the first terminal device sub-rule , set the priority of the terminal device priority sub-rule to be higher than that of the user usage frequency sub-rule, etc.
  • the server 400 uses filtering rules to filter terminal devices to select one of the most suitable second terminal devices for executing the current voice command, that is, to filter to the end, the most matching second terminal device
  • the number of terminal devices is 1. After screening by the first set of rules, as long as the number of corresponding terminal devices is not unique, the server 400 will use the sub-rules in the second set of rules to screen the remaining terminal devices one by one according to the priority.
  • the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart washing machine. After filtering, it is obtained that the smart TV, smart speaker, and smart refrigerator have playback functions, and the smart TV, smart speaker, and smart refrigerator need to be further screened through the second set of rules.
  • the server 400 further filters through the distance sub-rule with the first terminal device, and through the device location information in the device attributes of the terminal device, the server 400 judges that the smart TV, the smart speaker and the first terminal device are all in the living room, that is, the distance is relatively small. The distance between the smart refrigerator and the first terminal device is relatively long, while the smart refrigerator is in the kitchen.
  • the smart refrigerator is excluded. Since there are still two candidate terminal devices, i.e. smart TVs and smart speakers, which need to be further screened, the server 400 continues to filter through the terminal device priority sub-rules. priority is higher than that of the smart speaker, so the server 400 selects the smart TV as the most matching terminal device.
  • the server 400 finds out that the user is equipped with a smart TV, a smart speaker, a smart air conditioner, a smart refrigerator, and a smart electric fan. After filtering by the device, it is obtained that the smart air conditioner and the smart electric fan have room temperature adjustment functions, and the server 400 needs to further screen the smart air conditioner and the smart electric fan through the second set of rules. The server 400 continues to filter through the terminal device priority sub-rules. For voice commands such as room temperature adjustment, the priority of the smart air conditioner set by the user is higher than that of the smart electric fan, so the server 400 selects the smart air conditioner as the most matching terminal equipment. Of course, users can also set the priority of smart electric fans higher than that of smart air conditioners according to their own needs.
  • the server 400 when using the user frequency sub-rules to screen terminal devices, the server 400 respectively detects execution frequencies of a plurality of second terminal devices related to the user identifier, wherein the execution frequency refers to The number of times that the second terminal device has executed similar voice commands in historical behaviors.
  • the server 400 will reserve the second terminal device with the highest execution frequency.
  • the server 400 when using the sub-rule of distance from the first terminal device to screen terminal devices, the server 400 respectively detects the distances between multiple second terminal devices related to the user identifier and the first terminal device. The server 400 will reserve the second terminal device with the closest distance to the first terminal device.
  • the server 400 when screening terminal devices using the terminal device priorities, reserves the second terminal device with the highest priority according to the terminal device priorities set by the user.
  • the server when the server responds to the voice command issued by the user and inquires that there are currently multiple terminal devices, it can select the second terminal device that will finally execute the command based on the filtering rules, so as to avoid multiple communication between the user and the first terminal device. Voice interaction process to improve user experience.
  • the present application also provides a voice control method in some embodiments, the method includes: the server 400 receives a voice instruction including a user ID sent by a first terminal device, and searches for all terminal devices related to the user ID. When there is no terminal device related to the user identifier, the server 400 feeds back a parameter characterizing the absence of a terminal device, so that the first terminal device broadcasts that there is no terminal device executing the voice command. When there is a terminal device related to the user identifier, the server 400 uses preset filtering rules to filter out the most matching terminal device, and feeds back parameters representing the most matching terminal device, so that the first terminal device broadcasts the most matching terminal equipment, and control the most matching terminal equipment to execute the voice instruction.
  • FIG. 22 is a schematic diagram of a scene of a voice control process provided by an embodiment of the present application.
  • the terminal devices in the smart home scenario include terminal device 200-4 (that is, a smart refrigerator), terminal device 200-3 (that is, a smart washing machine), and terminal device 200-5 (that is, a smart display device).
  • terminal device 200-4 that is, a smart refrigerator
  • terminal device 200-3 that is, a smart washing machine
  • terminal device 200-5 that is, a smart display device.
  • the recording application in the first terminal equipment that is, the intelligent terminal 200-1
  • the control intention does not include the second terminal device to be controlled.
  • the intelligent terminal 200-1 sends the recorded data of the user to the server 400, so that the server 400 can recognize the voice command and obtain the specific control information corresponding to the voice command, so as to determine the first control information that the user actually wants to control according to the control information.
  • Two terminal devices, and directly control the second terminal device to execute the corresponding control instructions that is: through the interaction between the server 400 and the intelligent terminal 200-1, the voice control of the terminal device is realized; or the user can use the local server 400, such as the object
  • the recording module in the networked terminal records sound, wherein the sound is mainly the user's control intention, and the control intention does not include the second terminal device to be controlled.
  • the local server 400 recognizes the voice command entered by the user to obtain specific control information corresponding to the voice command, thereby determining the second terminal device that the user actually wants to control according to the control information, and directly controls the second terminal device to execute corresponding control commands.
  • the terminal device can be automatically controlled by voice to execute corresponding control instructions, which is convenient for the user to control and use the smart home device, and is beneficial to Improve intelligence and accuracy.
  • a smart home scene may include various terminal devices, and FIG. 22 is only an illustration, and does not specifically limit the type and number of smart devices.
  • the voice control method provided in the embodiment of the present application may be implemented based on a computer device, or a functional module or a functional entity in the computer device.
  • the computer may be a personal computer (personal computer, PC), a server, a mobile phone, a tablet computer, a notebook computer, a mainframe computer, etc., which are not specifically limited in this embodiment of the present application.
  • the voice control method provided in the embodiment of the present application may be implemented based on the above computer device.
  • the voice control process provided by the embodiment of the present application can be implemented based on the above-mentioned computer equipment.
  • This method can recognize the voice command of the user, and obtain the control information corresponding to the voice command.
  • the control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first candidate terminal device set corresponding to the function category; then based on the functional state corresponding to each candidate terminal device in the first candidate terminal device set, determine the second candidate terminal device set that matches the control instruction, Finally, determine the second terminal device that matches the control command from the second candidate terminal device set, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for users to control smart home devices And use, help to improve intelligence and accuracy.
  • FIG. 23A In order to describe this solution in more detail, the following will be described in conjunction with FIG. 23A in an exemplary manner. It can be understood that the steps involved in FIG. 23A may include more steps or fewer steps in actual implementation, And the order of these steps may also be different, as long as the voice control method provided in the embodiment of the present application can be realized.
  • FIG. 23A is a schematic flowchart of a voice control method provided in an embodiment of the present application
  • FIG. 23B is a schematic diagram of a principle of a voice control method provided in an embodiment of the present application.
  • This embodiment is applicable to the situation of controlling each terminal device included in the smart home scene.
  • the method of this embodiment can be executed by a voice control device, which can be implemented in hardware/or software, and can be configured in computer equipment.
  • the method specifically includes the following steps:
  • control information includes a function category and a control command.
  • the voice command can be understood as the data formed after the user records.
  • the control information can be understood as the control intention corresponding to the user's voice command, which includes the function category and control instructions related to the terminal device, but does not include the specific second terminal device to be controlled.
  • Terminal devices can be understood as various devices included in smart home scenarios, such as audio and video equipment, lighting systems, curtain control, air conditioning control, digital theater systems, audio and video servers, and network appliances.
  • the function category can be understood as the category to which the specific functions of the terminal device belong.
  • the category corresponding to the smart TV may include: volume, brightness, video playback scene, and recipe scene.
  • the control instruction can be understood as an operation instruction related to the terminal device, such as opening, closing, playing, and pausing.
  • each terminal device is in a different control state, and the user needs to specify the specific control state of each terminal device to control the terminal device.
  • voice commands that do not specify a terminal device, it may It will lead to execution failure during the voice control process or the need to guide the user to supplement information multiple times in order to determine the terminal device to be controlled.
  • the execution subject in this embodiment may be a local control device 200 with processing and interaction functions, such as an Internet of Things terminal, or a server 400 that interacts with the smart terminal 300 .
  • a local control device 200 with processing and interaction functions such as an Internet of Things terminal, or a server 400 that interacts with the smart terminal 300 .
  • the local control device 200 and the server 400 cannot directly obtain the specific information contained in the voice command, it is necessary to recognize the user's voice command, specifically through the voice recognition method and the semantic understanding method. , can also be recognized by a method such as a neural network model or a speech recognition system, which is not specifically limited in this embodiment. After the recognition, the control information corresponding to the voice command can be obtained.
  • S2302. Determine a first set of candidate terminal devices corresponding to the function category according to the pre-established terminal device information table.
  • the terminal device information table can be understood as a pre-established table related to the information corresponding to each terminal device in the smart home scene, and the table can include the device identification number, device name, function category and function status of each terminal device, etc. .
  • the first set of candidate terminal devices can be understood as a set of terminal devices included in the smart home scene that match the function category.
  • the first candidate terminal device corresponding to the function category in the control information can be obtained gather.
  • the second set of candidate terminal devices can be understood as a set of second terminal devices included in the smart home scene that match the control instructions, and this set is a candidate for the second terminal device that the user wants to control. gather.
  • the first candidate terminal device set may contain multiple candidate terminal devices, and each candidate terminal device may be in a different functional state. Therefore, after obtaining the first candidate terminal device set, in order to determine the terminal device that the user wants to control, further steps are required. narrow down. At this time, according to the functional status corresponding to each candidate terminal device in the first candidate terminal device set, the functional status corresponding to each candidate terminal device is compared with the control instruction in the control information, and the candidate that matches the control instruction can be obtained. A second set of candidate terminal devices formed by the terminal devices.
  • the functional state corresponding to candidate terminal device 1 is normal; the functional state corresponding to candidate terminal device 2 is playing, and the functional state corresponding to candidate terminal device 3 is off, then the candidate terminal device 3 Add to the second set of candidate terminal devices.
  • the second terminal device may be understood as a terminal device that matches the control instruction.
  • the second terminal device Since there may be multiple terminal devices included in the second candidate terminal device set, it is necessary to determine the second terminal device that matches the control instruction from the second candidate terminal device set, and the number of second terminal devices may be multiple , may depend on specific circumstances, and this application does not make specific limitations. After the second terminal device is determined, a corresponding control command is sent to the terminal device to control the second terminal device to execute the control command, so as to meet the needs of the user and accurately execute the control of the second terminal device in accordance with the user's voice command.
  • FIG. 23C is a schematic diagram of the process of determining the second set of candidate terminal devices in the embodiment of the present application, as shown in FIG. 23C:
  • each terminal device is Dev1, Dev2, Dev3, ...;
  • the total set of functional categories is defined as F, and each function is F1, F2, F3, ...;
  • the total set of functional states is defined as S, and the functional states are respectively S1, S2, S3, . . .
  • step 4 Query the sets in step 2, and determine the second set of candidate terminal devices according to the terminal devices corresponding to the sets whose elements are the same as FxSy.
  • the user's voice command is firstly recognized to obtain the control information corresponding to the voice command.
  • the control information includes the function category and the control command, and then according to the pre-established terminal device information table, determine the first A set of candidate terminal devices; then, based on the functional state corresponding to each candidate terminal device in the first set of candidate terminal devices, determine a second set of candidate terminal devices that matches the control instruction, and finally determine from the second set of candidate terminal devices that matches the control command Match the second terminal device, and control the second terminal device to execute the control command, and automatically control the terminal device to execute the corresponding control command through voice, which is convenient for the user to control and use the smart home device, and is conducive to improving intelligence and accuracy.
  • the terminal device information table is obtained in the following manner:
  • the preset scene can be understood as a scene that includes multiple terminal devices and the multiple terminal devices are interconnected through a network, such as a smart home scene, a smart office scene, and the like.
  • the device name, function name, function category, and function status corresponding to each terminal device included in the preset scene can be obtained through the reporting of terminal device information, and the corresponding device names of each terminal device can also be obtained in other ways. Name, feature name, feature category, and feature status. After obtaining the information corresponding to each terminal device, according to all the device names, function names, function categories and function states, it is possible to establish a corresponding terminal device information table, or in the device name, function name, function category and function status.
  • the terminal device information table can be updated in time after at least one of the changes occurs.
  • the terminal device information table is consistent with the actual functional status of each terminal device, thereby facilitating the determination of the first set of candidate terminal devices and ensuring the accuracy of the set. accuracy.
  • the method further includes:
  • first set of candidate terminal devices is an empty set, or if the first set of candidate terminal devices is a non-empty set and the second set of candidate terminal devices is an empty set, then send second prompt information, wherein the The second prompt information is used to instruct the user to determine the second terminal device from multiple terminal devices;
  • the second response information includes second identification information corresponding to the second terminal device
  • the first set of candidate terminal devices being an empty set may be understood as that there is no candidate terminal device meeting the conditions in the set. It may be understood that the second candidate terminal device set is an empty set, which means that there is no qualified terminal device in the set.
  • the second prompt information can be sent, for example, the local control device 200 can send the second prompt information, for example, it can send the second prompt information to its own display screen or audio application to display or play the second prompt information to instruct the user Determine the second terminal device from the multiple terminal devices; or the server 400 sends second prompt information to the smart terminal 300 to instruct the user to determine the second terminal device from the multiple terminal devices.
  • the second response information fed back by the user is received. Since the second response information includes the second identification information corresponding to the second terminal device, the second terminal device corresponding to the second identification information can be directly controlled to execute the control instruction.
  • the second terminal device can be determined through the above method, so as to meet the control requirement of the user and improve the user experience.
  • FIG. 24A is a schematic flow chart of another voice control method provided by the embodiment of the present application
  • FIG. 24B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • This embodiment is further expanded and optimized on the basis of the foregoing embodiments.
  • a possible implementation of S2304 in this embodiment is as follows:
  • the second candidate terminal device set may contain multiple second candidate terminal devices, in order to determine the second terminal device that matches the control instruction, it is necessary to determine all the second candidate terminal devices that are included in the second candidate terminal device set that match the control instruction.
  • S23042 Determine the second terminal device from the second candidate terminal device set according to the relationship between the quantity and the preset threshold, and control the second terminal device to execute the control instruction.
  • the preset threshold may be a preset value, such as 1, 3, etc., and may also be determined according to specific circumstances, which is not specifically limited in this embodiment.
  • the second terminal device is determined from the terminal device set, for example, all the second candidate terminal devices included in the second candidate terminal device set are second terminal devices, or part of the second candidate terminal devices included in the second candidate terminal device set
  • the terminal device is the second terminal device. After the second terminal device is determined, it is necessary to control the second terminal device to execute the control instruction, so as to implement smart home control through voice and reduce user operations.
  • determining the second terminal device through the above method is simple and quick, and can improve work efficiency.
  • FIG. 25A is a schematic flowchart of another voice control method provided by the embodiment of the present application
  • FIG. 25B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • This embodiment is further expanded and optimized on the basis of the foregoing embodiments.
  • a possible implementation of S23042 in this embodiment is as follows:
  • the number of second candidate terminal devices is less than or equal to the preset threshold, it means that the number of second candidate terminal devices does not exceed the upper limit. Therefore, all second candidate terminal devices included in the second candidate terminal device set are determined as the second Terminal Equipment.
  • a control instruction needs to be sent to each second terminal device, so as to control each second terminal device to respectively execute the control instruction.
  • the local control device 200 sends the first prompt information, for example, it can send the first prompt information to its own display screen or audio application, so as to display or play the first prompt information, to instruct the user to send the first prompt information from multiple terminals
  • the second terminal device is determined in the device; or the server 400 sends the first prompt information to the smart terminal device 204 to instruct the user to determine the second terminal device from multiple terminal devices.
  • the first response information fed back by the user is received, so as to subsequently control the second terminal device corresponding to the first identification information to execute the control instruction.
  • the first response information includes the first identification information corresponding to the second terminal device, it is possible to directly control the second terminal device corresponding to the first identification information to execute the control instruction.
  • FIG. 26A is a schematic flowchart of another voice control method provided by the embodiment of the present application
  • FIG. 26B is a schematic diagram of the principle of another voice control method provided by the embodiment of the present application.
  • This embodiment is further expanded and optimized on the basis of the foregoing embodiments.
  • a possible implementation of S2301 in this embodiment is as follows:
  • the speech recognition method is a method for converting speech into text, such as speech recognition software.
  • the speech recognition method can perform text recognition on the speech instruction, so as to obtain the text information corresponding to the speech instruction.
  • S23012 perform semantic understanding on the text information by using a semantic understanding method, and obtain control information contained in the text information, where the control information includes function categories and control instructions.
  • the semantic understanding method may include a keyword extraction method, an information extraction method, and the like.
  • the text information recognized by the machine may contain redundant information, repeated information, etc., in order to further improve the accuracy of the recognition process, the text information is semantically understood through the semantic understanding method, and the control information contained in the text information is obtained.
  • Information the control information includes function categories and control instructions.
  • control information obtained through the above method is more accurate and more in line with the actual situation, which is beneficial to ensure the smooth progress of the subsequent process.
  • FIG. 26C is a schematic diagram of the process of obtaining control information in the embodiment of the present application, as shown in FIG. 26C:
  • voice recognition is performed on the voice command to obtain the first information
  • semantic understanding of the first information is performed to obtain the control information.
  • FIG. 27A is a schematic structural diagram of a local control device in the embodiment of the present application, as shown in FIG. 27A:
  • the local control device 200 includes voice recognition service, semantic understanding service and terminal device control service.
  • the speech recognition service is mainly used for recording and recognizing the user's voice command to obtain the recognition result
  • the semantic understanding service is mainly used for determining the control information according to the recognition result
  • the home control service is used for maintaining the terminal device information table and receiving the terminal device report device information and control the corresponding terminal device according to the control information.
  • FIG. 27B is a schematic structural diagram of interaction between a local control device and a terminal device in the embodiment of the present application, as shown in FIG. 27B:
  • the voice recognition service includes a recording module and a recognition engine, wherein the recording module is used for recording, and the recognition engine is used for recognition according to the user's voice command to obtain a recognition result.
  • Semantic understanding services include functional categories and control instructions.
  • the home control service includes a terminal device information table, determining a second terminal device, and voice command control.
  • the home control service interacts with each terminal device, such as terminal device A, terminal device B, ..., terminal device N, and the home control service obtains the terminal device information table according to the device information reported by each terminal device; according to the terminal device information table and the functional category to determine the first set of candidate terminal devices; determine the second set of candidate terminal devices according to the second set of candidate terminal devices and the control instruction, and determine the second set of terminal devices from the second set of candidate terminal devices, and control the second terminal device Execute the control command, so as to realize the control of the smart home through the voice command.
  • the terminal device is responsible for reporting device information and receiving and executing control instructions issued by the home control service.
  • terminal device information table is as shown in Table 2 below:
  • the brightness and volume of the TV are normal and the menu UI is being displayed;
  • the smart speaker is playing music
  • the refrigerator door is open.
  • each device included in Table 1 is a terminal device.
  • Example 1 Suppose the voice command is "too dark", the function category is brightness, the control command is increase, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 1 Curtain and 2 TV; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 1 curtain, because all the second candidate terminal devices contained in the second candidate terminal device set If the number is equal to the preset threshold, it is determined that the second terminal device is a curtain, and the curtain is controlled to perform an opening function.
  • Example 2 Assume that the voice command is "I want to grill a steak", the functional category is ingredient scene and recipe scene, the control command is cooking and query, and the preset threshold is 1, then the first candidate terminal device can be determined according to the functional category and Table 1 The set is: 4 televisions and 6 ovens; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 6 ovens, because the second candidate terminal device set contains The number of all second candidate terminal devices is equal to the preset threshold, then it is determined that the second terminal devices are 6 ovens, and the ovens are controlled to perform the function of grilling steaks.
  • the TV will introduce the grilled steak recipe.
  • Example 3 Assuming that the voice command is "too loud", the function category is volume, the control command is lower, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 3 TVs and 5 smart speakers; then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 5 smart speakers, because all the second candidate terminal devices contained in the second candidate terminal device set If the number of terminal devices is equal to the preset threshold, it is determined that the second terminal device is 5 smart speakers, and the smart speakers are controlled to perform a volume down function.
  • the user is prompted to select whether the device to turn down the volume is a TV or a sound box.
  • Example 4 Suppose the voice command is "close the door", the function category is door, the control command is close, and the preset threshold is 1, then according to the function category and Table 1, it can be determined that the first set of candidate terminal devices is: 7 refrigerators and 8 televisions ; Then according to the functional status and control instructions of each terminal device in the first candidate terminal device set, it can be determined that the second candidate terminal device set is: 7 refrigerators, due to the number of all second candidate terminal devices contained in the second candidate terminal device set is equal to the preset threshold, then it is determined that the second terminal device is a 7 refrigerator, and the refrigerator is controlled to execute the door closing function.
  • the user is prompted whether the device to close the door is a refrigerator or an oven.
  • the embodiment of the present application provides an electronic device, the electronic device includes at least a processor and a memory, and the processor is used to implement the operation of any one of the above-mentioned terminal devices when executing the computer program stored in the memory. Voice control method.
  • the embodiment of the present application provides a computer-readable non-volatile storage medium, which stores a computer program, and when the computer program is executed by a processor, it realizes voice control of any terminal device as described above. method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente demande divulgue un dispositif terminal et un serveur pour commande vocale. Le serveur du présent mode de réalisation reçoit un signal vocal transmis par un premier dispositif terminal, génère une instruction vocale en fonction du signal vocal, et renvoie l'instruction vocale au premier dispositif terminal. Si le premier dispositif terminal peut exécuter une opération correspondant à l'instruction vocale, l'opération correspondante est exécutée en réponse à l'instruction vocale. Si le premier dispositif terminal ne peut pas exécuter l'opération correspondant à l'instruction vocale, une demande de distribution d'instruction est transmise au serveur. Le serveur transmet l'instruction vocale à un second dispositif terminal en fonction de la demande de distribution d'instruction, de telle sorte que le second dispositif terminal exécute l'opération correspondante en réponse à l'instruction vocale, le second dispositif terminal pouvant exécuter l'opération correspondant à l'instruction vocale.
PCT/CN2022/100547 2021-06-22 2022-06-22 Dispositif terminal et serveur pour commande vocale WO2022268136A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202280038248.XA CN117882130A (zh) 2021-06-22 2022-06-22 一种进行语音控制的终端设备及服务器

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN202110688867.0A CN113450792A (zh) 2021-06-22 2021-06-22 终端设备的语音控制方法、终端设备及服务器
CN202110688867.0 2021-06-22
CN202110917713.4A CN115910050A (zh) 2021-08-11 2021-08-11 服务器及语音控制方法
CN202110917713.4 2021-08-11
CN202111521226.2A CN114172757A (zh) 2021-12-13 2021-12-13 服务器、智能家居系统及多设备语音唤醒方法
CN202111521226.2 2021-12-13
CN202210151526.4A CN114609920A (zh) 2022-02-18 2022-02-18 智能家居控制方法、装置、计算机设备和介质
CN202210151526.4 2022-02-18

Publications (1)

Publication Number Publication Date
WO2022268136A1 true WO2022268136A1 (fr) 2022-12-29

Family

ID=84544127

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100547 WO2022268136A1 (fr) 2021-06-22 2022-06-22 Dispositif terminal et serveur pour commande vocale

Country Status (2)

Country Link
CN (1) CN117882130A (fr)
WO (1) WO2022268136A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009748A (zh) * 2023-03-28 2023-04-25 深圳市人马互动科技有限公司 儿童互动故事中图片信息交互方法及装置

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818782A (zh) * 2016-09-12 2018-03-20 上海声瀚信息科技有限公司 一种实现家用电器智能控制的方法及系统
CN108766432A (zh) * 2018-07-02 2018-11-06 珠海格力电器股份有限公司 一种控制家电间协同工作的方法
CN109474843A (zh) * 2017-09-08 2019-03-15 腾讯科技(深圳)有限公司 语音操控终端的方法、客户端、服务器
US20190304466A1 (en) * 2018-03-30 2019-10-03 Boe Technology Group Co., Ltd. Voice control method, voice control device and computer readable storage medium
US20190318736A1 (en) * 2018-04-11 2019-10-17 Baidu Online Network Technology (Beijing) Co., Ltd Method for voice controlling, terminal device, cloud server and system
CN111722824A (zh) * 2020-05-29 2020-09-29 北京小米松果电子有限公司 语音控制方法、装置及计算机存储介质
CN111883129A (zh) * 2020-08-03 2020-11-03 海信视像科技股份有限公司 终端设备控制方法、装置及终端设备
CN112017652A (zh) * 2019-05-31 2020-12-01 华为技术有限公司 一种交互方法和终端设备
CN113450792A (zh) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 终端设备的语音控制方法、终端设备及服务器

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818782A (zh) * 2016-09-12 2018-03-20 上海声瀚信息科技有限公司 一种实现家用电器智能控制的方法及系统
CN109474843A (zh) * 2017-09-08 2019-03-15 腾讯科技(深圳)有限公司 语音操控终端的方法、客户端、服务器
US20190304466A1 (en) * 2018-03-30 2019-10-03 Boe Technology Group Co., Ltd. Voice control method, voice control device and computer readable storage medium
US20190318736A1 (en) * 2018-04-11 2019-10-17 Baidu Online Network Technology (Beijing) Co., Ltd Method for voice controlling, terminal device, cloud server and system
CN108766432A (zh) * 2018-07-02 2018-11-06 珠海格力电器股份有限公司 一种控制家电间协同工作的方法
CN112017652A (zh) * 2019-05-31 2020-12-01 华为技术有限公司 一种交互方法和终端设备
CN111722824A (zh) * 2020-05-29 2020-09-29 北京小米松果电子有限公司 语音控制方法、装置及计算机存储介质
CN111883129A (zh) * 2020-08-03 2020-11-03 海信视像科技股份有限公司 终端设备控制方法、装置及终端设备
CN113450792A (zh) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 终端设备的语音控制方法、终端设备及服务器

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009748A (zh) * 2023-03-28 2023-04-25 深圳市人马互动科技有限公司 儿童互动故事中图片信息交互方法及装置
CN116009748B (zh) * 2023-03-28 2023-06-06 深圳市人马互动科技有限公司 儿童互动故事中图片信息交互方法及装置

Also Published As

Publication number Publication date
CN117882130A (zh) 2024-04-12

Similar Documents

Publication Publication Date Title
US10755706B2 (en) Voice-based user interface with dynamically switchable endpoints
KR102480949B1 (ko) 디지털 음성 어시스턴트 컴퓨팅 디바이스들 사이에서 신호 프로세싱 조정
JP6516585B2 (ja) 制御装置、その方法及びプログラム
CN105471705B (zh) 一种基于即时通讯的智能控制方法、设备及系统
WO2018039814A1 (fr) Procédé, appareil et système de commande domestiques intelligents
WO2019205134A1 (fr) Procédé de commande vocale de maison intelligente, appareil, dispositif et système
US11782590B2 (en) Scene-operation method, electronic device, and non-transitory computer readable medium
KR102551715B1 (ko) Iot 기반 알림을 생성 및 클라이언트 디바이스(들)의 자동화된 어시스턴트 클라이언트(들)에 의해 iot 기반 알림을 자동 렌더링하게 하는 명령(들)의 제공
JP2018531404A6 (ja) ホームオートメーションシステムの音声制御のための履歴ベースのキーフレーズの提案
JP2018531404A (ja) ホームオートメーションシステムの音声制御のための履歴ベースのキーフレーズの提案
CN114172757A (zh) 服务器、智能家居系统及多设备语音唤醒方法
CN114067798A (zh) 一种服务器、智能设备及智能语音控制方法
CN111665737A (zh) 一种智能家居场景控制方法及系统
US20200213653A1 (en) Automatic input selection
CN111817936A (zh) 智能家居设备的控制方法,装置、电子设备以及存储介质
WO2022268136A1 (fr) Dispositif terminal et serveur pour commande vocale
CN111367188A (zh) 智能家居的控制方法、装置、电子设备和计算机存储介质
CN114155855A (zh) 语音识别方法、服务器以及电子设备
CN113450792A (zh) 终端设备的语音控制方法、终端设备及服务器
WO2024108905A1 (fr) Serveur, dispositif intelligent et procédé de commande de dispositif intelligent
CN116566760B (zh) 智能家居设备控制方法、装置、存储介质及电子设备
WO2018023514A1 (fr) Système de commande de musique de fond domestique
EP3557574A1 (fr) Procédé de commande vocale, serveur et système d'échange vocal
CN114035444B (zh) 智能家居的控制方法
CN116582381B (zh) 智能设备控制方法、装置、存储介质及智能设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22827628

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202280038248.X

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE