CN114067798A

CN114067798A - Server, intelligent equipment and intelligent voice control method

Info

Publication number: CN114067798A
Application number: CN202111521241.7A
Authority: CN
Inventors: 张立泽; 李金凯; 王建君
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-02-18

Abstract

The application provides a server, intelligent equipment and an intelligent voice control method, wherein after a user inputs a voice control instruction, the voice control instruction is firstly sent to the server, the server extracts user intention information from the voice control instruction, and the intelligent equipment capable of realizing the user intention information is matched in a mapping database so as to generate a target equipment list. And searching the executing equipment which is closest to the user or has the highest executing weight value in the target equipment list, and issuing a voice control instruction to the executing equipment so as to respond to the voice control instruction input by the user. According to the method, the execution equipment responding to the voice control instruction of the user can be determined based on the user intention matching result and the sound source positioning result, so that when the intelligent home system comprises a plurality of intelligent equipment capable of responding to the voice control instruction, only the execution equipment responds, and the problems of control confusion and the situation that only the equipment responds but feedback cannot support control service are relieved.

Description

Server, intelligent equipment and intelligent voice control method

Technical Field

The application relates to the technical field of smart home, in particular to a server, smart equipment and a smart voice control method.

Background

The intelligent voice control is a novel interactive mode, can perform semantic recognition on voice information input by a user, and then controls equipment to operate according to a semantic recognition result. In order to realize the interaction process based on intelligent voice control, an intelligent voice system can be built in the intelligent equipment. The intelligent speech system may be composed of a hardware portion and a software portion. The hardware part mainly comprises a microphone, a loudspeaker and a controller, and is used for receiving, feeding back and processing voice information; the software part mainly comprises a voice conversion module, a natural language processing module and a control module, and is used for converting an input sound signal into a text signal and forming a specific control instruction for control.

When a user uses the intelligent voice system, the intelligent device can detect specific voice input by the user through the hardware part, then carries out processing on the input voice through the calling software part, converts the voice into a control instruction, finally responds to the control instruction to carry out corresponding control, and feeds back an execution result through the hardware part. For example, an intelligent voice system may be built in the smart television, and when a user inputs voice "i want to watch a movie" through a microphone, the smart television may be triggered to play movie media for the user to watch.

The intelligent voice control can also be applied to an intelligent home system consisting of a plurality of intelligent devices. For example, the smart home system may include smart devices such as a smart television, a smart speaker, and a smart refrigerator, which establish a communication connection relationship with each other. Because intelligent voice system can all be built-in to a plurality of smart machines in the intelligent home systems, lead to a plurality of smart machines to respond to the pronunciation of user's input simultaneously, cause the control confusion. If the user inputs voice that "i want to watch a movie", both the smart television and the smart sound box respond, and respectively feed back different execution results, that is, the smart television can jump to a movie playing interface, and the smart sound box feeds back error prompt information such as "i can not complete", so that the operation experience of the user is influenced.

Disclosure of Invention

The application provides a server, intelligent equipment and an intelligent voice control method, and aims to solve the problem that control is disordered due to the fact that a plurality of intelligent equipment simultaneously respond to voice of a user.

In a first aspect, the present application provides a server comprising: the device comprises a storage module, a communication module and a control module. The storage module is configured to store a mapping database, and the mapping database comprises mapping relations between user intention information and equipment capability information; the communication module is configured to establish communication connections with a plurality of smart devices; the control module is configured to perform the following program steps:

receiving a voice control instruction sent by intelligent equipment;

extracting user intention information from the voice control instruction in response to the voice control instruction;

generating a target device list, wherein the target device list comprises intelligent devices which are obtained by matching the mapping database and can realize the user intention information;

searching for an executing device in the target device list, wherein the executing device is an intelligent device closest to the voice control instruction initiating position;

and sending the voice control instruction to the execution equipment to trigger the execution equipment to execute a voice response aiming at the voice control instruction.

In a second aspect, the present application further provides a smart device, including: audio input device, audio output device, communicator and controller. Wherein the audio input device is configured to detect voice audio data input by a user; the audio output device is configured to play a voice response; the communicator is configured to establish a communication connection with a server; the controller is configured to perform the following program steps:

acquiring voice audio data input by a user;

generating a voice control instruction according to the voice audio data;

sending the voice control instruction to the server so that the server extracts user intention information in the voice control instruction and matches intelligent equipment capable of realizing the user intention information in the mapping database to generate a target equipment list; searching for an executing device in the target device list and sending a voice control instruction;

and executing voice response aiming at the voice control instruction.

In a third aspect, the present application further provides an intelligent voice control method, which is applied to an intelligent home system, where the intelligent home system includes a server and an intelligent device, and a communication connection is established between the server and the intelligent device; the intelligent voice control method comprises the following steps:

the intelligent equipment acquires voice audio data input by a user and generates a voice control instruction according to the voice audio data;

the intelligent equipment sends the voice control instruction to the server;

the server extracts user intention information from the voice control instruction and matches intelligent equipment capable of realizing the user intention information in the mapping database to generate a target equipment list;

the server searches the execution equipment in the target equipment list and issues a voice control instruction to the execution equipment;

and the intelligent device as an execution device executes voice response aiming at the voice control instruction.

According to the technical scheme, the server, the intelligent device and the intelligent voice control method can be used for sending the voice control instruction to the server after the user inputs the voice control instruction, extracting user intention information from the voice control instruction by the server, and matching the intelligent device capable of realizing the user intention information in the mapping database to generate the target device list. And searching the executing equipment which is closest to the user or has the highest executing weight value in the target equipment list, and issuing a voice control instruction to the executing equipment so as to respond to the voice control instruction input by the user. According to the method, the execution equipment responding to the voice control instruction of the user can be determined based on the user intention matching result and the sound source positioning result, so that when the intelligent home system comprises a plurality of intelligent equipment capable of responding to the voice control instruction, only the execution equipment responds, and the problems of control confusion and the situation that only the equipment responds but feedback cannot support control service are relieved.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a usage scenario of an intelligent home system in an embodiment of the present application;

fig. 2 is a hardware configuration diagram of an intelligent device in an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a voice interaction process in an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating response voice interaction effects of a plurality of smart devices according to an embodiment of the present application;

fig. 5 is a schematic flowchart of an intelligent voice control method on the intelligent device side in the embodiment of the present application;

fig. 6 is a schematic flow chart of a server-side intelligent voice control method in the embodiment of the present application;

FIG. 7 is a flowchart illustrating a lookup execution apparatus according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an online process of an intelligent device in an embodiment of the present application;

FIG. 9 is a timing diagram illustrating an embodiment of a server-side intelligent speech control method;

fig. 10 is a timing diagram of an intelligent voice control method on the intelligent device side in the embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present application. But merely as exemplifications of systems and methods consistent with certain aspects of the application, as recited in the claims.

In the embodiment of the present application, the smart home system is a network system established based on a unified control service based on a specific area network, and the smart home system may include a plurality of smart devices 200 that establish a communication connection relationship with each other. The plurality of intelligent devices 200 may access the same local area network to implement a communication connection relationship between the devices. The plurality of intelligent devices 200 can also directly form a point-to-point network through a unified communication protocol to realize communication connection. For example, a plurality of smart devices 200 may communicate with each other by connecting to the same wireless lan. For example, one smart device 200 may also establish a communication connection with another plurality of smart devices 200 through bluetooth, infrared, cellular network, power carrier communication, and the like.

The smart device 200 is a device having a communication function, and capable of receiving, transmitting, and executing a control command and implementing a specific function. The smart device 200 includes, but is not limited to, a smart display device, a smart terminal, a smart home appliance, a smart gateway, a smart lighting device, a smart audio device, a game device, and the like. The plurality of smart devices 200 constituting the smart home system may be the same type of device or different types of devices. For example, as shown in fig. 1, in the same smart home system, a smart television, a smart sound box, a smart refrigerator, a plurality of smart light fixtures, and the like may be included. The smart devices 200 may be distributed in different locations to meet the usage requirements at the corresponding locations.

It should be noted that, the smart home system described in the present application does not limit the application range of the scheme to be protected in the present application. In other words, in practical applications, the server, the intelligent device, and the intelligent voice control method provided by the present application are not limited to being applied in the field of smart homes, and are also applicable to other systems supporting intelligent voice control, such as an intelligent office system, an intelligent service system, an intelligent management system, and an industrial production system.

The smart device 200 has a specific hardware configuration according to the actual function of the smart device 200. As shown in fig. 2, taking a display apparatus as an example, the smart device 200 having a display function may include at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface.

In some embodiments, the controller 250 includes a central processor, a video processor, an audio processor, a graphic processor, a RAM, a ROM, first to nth interfaces for input/output.

In some embodiments, the display 260 includes a display screen component for displaying pictures, and a driving component for driving image display, a component for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the controller 250 controls the operation of the smart device and responds to user actions through various software control programs stored in memory. The controller 250 controls the overall operation of the smart device 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, the smart device 200 is also in data communication with the server 400. The smart device 200 may be allowed to communicatively connect through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various content and interactions to the smart device 200. The server 400 may be a cluster, or may be a plurality of clusters, and may include one or more types of server groups.

In some embodiments, the smart device 200 may have a smart voice system built in to support smart voice control of the user. The smart voice control refers to an interactive process in which a user operates the smart device 200 by inputting voice audio data. To implement intelligent voice control, the smart device 200 may include an audio input device and an audio output device. The audio input device is used to collect voice and audio data input by a user, and may be a microphone device built in or externally connected to the smart device 200. The audio output device is used for making sound to play voice response. For example, as shown in FIG. 3, when the user enters "hi!via the audio input device! When a wake-up word such as "small x" is used, the smart device 200 may play an "i am" voice response through the audio output device to guide the user to complete subsequent voice input.

In some embodiments, the smart voice system built into the smart device 200 also supports a one-talk mode, i.e., supports a "one-shot" mode. In this mode, the user can directly implement the control function through a small number of voice inputs. For example, in the legacy mode, if the user wants to control the smart device 200 to play a movie resource, the user needs to input the voice "hi, xiax" first, and after the smart device 200 feeds back "i am", and then inputs "i want to watch a movie", the smart device 200 feeds back "find the following movie for you". While in the "one-shot" mode, the user can directly input "hi! Small x, i.e. i want to watch a movie ", the smart device 200 directly feeds back" find the following movie for you "after receiving the voice instruction, reduces the number of voice interactions, and improves the voice interaction efficiency.

For a plurality of intelligent devices 200 in the same intelligent home system, a user can control linkage of the plurality of devices through intelligent voice. For example, a user may input a voice command "turn on a bedroom lamp" through the smart speaker, and the smart speaker may generate a control command for turning on light in response to the voice command, and then send the control command to a lamp named "bedroom" in the smart home system to control turning on the bedroom lamp. Meanwhile, the smart sound box also responds to the voice input of the user, namely, the feedback voice content of 'turning on a bedroom lamp for you' and the like is played.

When linkage control is performed among a plurality of intelligent devices 200, a control instruction can be directly transmitted to a controlled device through the intelligent device 200 which receives user voice audio data, or can be transmitted to a specific relay device such as a router through the intelligent device 200, and then transmitted to the controlled device through the relay device. In some embodiments, the control instruction may also be transmitted to the controlled device through the server 400. For example, when a user controls a certain smart device 200 in a smart home system through the smart terminal 300 outside a local area network where a smart home is located, the smart terminal 300 may first send a control instruction to the server 400, and the server 400 then transmits the control instruction to the smart device 200 for control.

In order to control the smart devices 200 in the smart home system, the server 400 may individually issue a control instruction and related data to any smart device 200. For example, for a display device, a user may control the display device to request online play of the media asset through interactive operation, and the server 400 may feed back the media asset data to the display device according to the play request. For the linkage control of a plurality of smart devices 200, the server 400 may issue control commands and related data to the smart home system in a unified manner. For example, when the user intelligent sound box controls to turn on the bedroom lamp, the intelligent sound box can send the control instruction input by the user to the server 400, and the server 400 sends feedback data to the intelligent home system, so that the intelligent home system sends the turn-on instruction to the bedroom lamp and feeds back a control response to the intelligent sound box.

A part of the smart devices 200 in the smart home system may have a complete smart voice system built therein, and such smart devices 200 may be used as a main control device, and may independently receive, process, and respond, and may send a control instruction corresponding to a voice audio to other smart devices 200. For example, a complete smart voice system may be built in the smart device 200, such as a display device, a smart speaker, a smart refrigerator, etc., to receive voice audio input by a user. Some of the smart devices 200 in the smart home system may not have a complete smart voice system built therein, and only serve as controlled devices to receive the control instruction sent by the main control device. For example, intelligent devices such as lamps and small household appliances may receive a control instruction transmitted by a display device serving as a main control device, and start, stop or change operation parameters.

Along with the continuous improvement of the intelligent degree of the equipment, the types of the intelligent equipment 200 which can be internally provided with a complete intelligent voice system are gradually increased, so that a plurality of intelligent equipment 200 which can be used as main control equipment can be included in the same intelligent home system. At this time, a voice control command input by the user may be responded by a plurality of smart devices 200, which not only wastes computational resources, but also causes confusion in the control process.

For example, when the smart television and the smart sound box with the built-in complete smart voice system are simultaneously accessed to the smart home system, the smart television and the smart sound box can both receive the voice content and respectively make voice interactive responses when the user inputs voice "i want to watch a movie", that is, as shown in fig. 4, the smart television can feed back "find the following movies for you", and display a movie list for the user to select. And because the smart sound box does not support movie playing, the smart sound box can feed back that "I can not know what you are saying", or "turn on the television for you", and control the smart television to start running. At the moment, the intelligent television and the intelligent sound box can simultaneously feed back voice response, so that the information heard by the user is disordered. And because the smart television is already in the on state and responds to the input voice, the intelligent sound box cannot respond to the on operation process of controlling the smart television, error control is easily caused, and the interaction requirements of users cannot be met.

It should be noted that the input mode of the voice instruction may include a plurality of forms, such as a remote controller input and a far-field voice input, where the remote controller input means that the user presses a voice key on a remote controller associated with the smart device 200 and inputs a voice through a microphone on the remote controller, and at this time, since the remote controller used by the user is usually an associated device of a specific smart device 200, the associated smart device 200 may directly respond to the voice instruction input by the user. For example, when a user wants to control the smart television by voice, a voice instruction can be input through a remote controller matched with the smart television, and at the moment, the smart television can directly make a voice response. Therefore, when the user inputs the voice command in this way, the user can directly determine that the intelligent device 200 matched with the remote controller is used as a response device, and the problem of disordered voice response generally does not occur.

When the user interacts in a far-field voice input mode, the plurality of smart devices 200 having the voice interaction function can detect the voice input of the user, and thus respond respectively, resulting in confusion of voice response. To this end, in some embodiments, the user may also set a different wake-up word for each smart device 200 separately, e.g., the wake-up word of the smart television being "hi! The small x and the awakening word of the intelligent sound box are small x. When the user inputs a wake-up word, the smart device 200 corresponding to the wake-up word can be used as a response device to alleviate the problem of control confusion.

However, in this way of determining the voice input of the responding device by using different wake words, not only the user needs to remember the correspondence between each device and the wake word, but also the same smart voice system is built in the smart devices 200 of the same manufacturer, and the wake words are usually the same, so that the problem of control confusion still occurs for such devices.

In order to alleviate the problem of control confusion, some embodiments of the present application provide an intelligent voice control method, which may be applied to an intelligent home system composed of a server 400 and multiple intelligent devices 200, so as to prevent multiple intelligent devices 200 from responding to the same voice command. In order to implement the intelligent voice control method, the server 400 in the smart home system at least includes a storage module 410, a communication module 420 and a control module 430. The storage module 410 is configured to store a mapping database, where the mapping database includes mapping relationships between user intention information and device capability information. The communication module 420 is configured to establish communication connections with a plurality of smart devices 200 to issue control instructions and related data to the plurality of smart devices 200. The control module 430 is configured to execute the program steps of the server 400 side of the intelligent voice control method to determine the executing device of the plurality of intelligent devices 200. Similarly, in order to satisfy the implementation of the intelligent voice control method, the intelligent device 200 in the smart home system at least includes an audio input device, an audio output device, a communicator 220 and a controller 250. Wherein the audio input device is configured to detect voice audio data input by a user. The audio output device is configured to play a voice response. The communicator 220 is configured to establish a communication connection with the server 400, so as to upload the control instruction to the server 400 and receive the control instruction issued by the server 400. The controller 250 is configured as a program step executed by the smart device 200 side in the smart voice control method to complete the response of the smart voice control process.

As shown in fig. 5 and 6, the intelligent voice control method includes the following steps:

the smart device 200 acquires voice audio data input by a user. When a user is in an intelligent home system environment, voice input can be performed in real time, and a voice sound signal input by the user can be converted into an electric signal by an audio input device built in the intelligent device 200, and voice audio data can be obtained through a series of signal processing methods such as noise reduction, amplification, coding and conversion.

When performing voice interaction, a user may input voice audio data in a variety of ways. That is, in some embodiments, the user may input voice audio data through an audio input device built into the smart device 200. For example, a user may input speech "hi!through a microphone device built into smart device 200! Small x, i want to watch a movie ", the microphone may convert the voice sound signal into an electrical signal and pass it to the controller 250 for subsequent processing.

To trigger the smart device 200 for smart voice control, in some embodiments, the user may also carry a specific wake-up word in the input voice audio data. The wake-up word is a piece of speech containing specific content, such as "hi! Small x, small x, black ground, black, and the like! Xxx ", and the like. For the process of inputting voice and audio data by the user, especially for the process of inputting voice and audio data by the far-field microphone built in the smart device 200, the smart device 200 may determine whether the voice input by the user contains a wake-up word, and perform subsequent processing after detecting the wake-up word, so as to alleviate the false triggering of the smart voice control process.

After acquiring the voice audio data input by the user, the smart device 200 may further generate a voice control instruction according to the voice audio data. The voice control command is a control command, and has a specific command format, including control action functions, control object codes and the like. After the intelligent device 200 receives the voice audio data, the intelligent device 200 may perform text conversion on the voice audio data through a voice processing module in the intelligent voice system, that is, convert waveform data in the voice audio data into text data through acoustic feature extraction.

After converting to text data, the smart device 200 may use a segmentation tool to convert unstructured text data to structured text data. That is, the intelligent device 200 may remove the text contents without practical meanings, such as the mood words and the auxiliary words, in the text data by means of word bank matching, etc., retain the keywords in the text data, and separate the keywords according to the word senses to obtain the structured text.

After obtaining the structured text data, the smart device 200 may also enter the structured text into a word processing model. The word processing model is an artificial intelligence model based on machine learning. The word processing model may computationally determine a classification probability that text information is attributed to a particular semantic after text data is input. Therefore, the word processing model can output the classification probability of the text data to each control instruction by using various control instructions as the classification labels, wherein the control instruction with the highest classification probability is the control instruction corresponding to the voice audio data.

The intelligent device 200 may use the sample data and the set input/output rules to repeatedly train the initial model to obtain a word processing model. Wherein, the sample data is text information with a label. In the process of model training, sample data can be used as input, classification probability is used as output, and calculation is performed on the sample data. And comparing the output result with the label in the sample data to obtain a training error, and then reversely propagating the training error, namely adjusting the model parameters according to the training error, so that the character processing model capable of accurately outputting the recognition result can be obtained through repeatedly inputting a large amount of sample data.

After model calculation, the smart device 200 may convert voice audio data input by the user into a voice control command. After the conversion of the smart device 200, the controlled device or the server 400 may directly process the voice control command after receiving the voice control command, such as executing a control action according to the voice control command and extracting a user intention from the voice control command.

Obviously, in some embodiments, the smart device 200 may directly send the voice audio data as the voice control instruction, that is, for the smart device 200 with low data processing capability or without a complete smart voice system built in, the smart device 200 may directly forward the audio data, and the server 400 or other smart devices 200 perform language processing, so as to alleviate the computational load of the current smart device 200.

After generating the voice control instruction, the smart device 200 may send the voice control instruction to the server 400, so that the server 400 may filter the smart device 200 in the current smart home system according to the voice control instruction. In some embodiments, the smart device 200 may first detect the voice audio data input by the user to detect whether explicit responding device information is contained therein. For example, when a voice command input by the user includes a wake-up word and only one smart device 200 supports performing voice interaction using the wake-up word, it may be determined that explicit responding device information is included in the current voice audio data. At this time, the smart device 200 may not report the voice control instruction to the server 400 for device screening, but directly transmit the voice control instruction to the smart device 200 using the wakeup word through the local smart home system for voice control response.

Through the detection of the contents of the voice audio data by the smart device 200, when it is determined that the responding device is not explicitly indicated in the voice audio data input by the user, the smart device 200 further transmits a voice control command to the server 400 to screen the responding device through the server 400.

Because the smart home system may include a plurality of smart devices 200 with a built-in smart voice system, when a user inputs voice, the plurality of smart devices 200 in the smart home system may all be capable of detecting voice audio data, and at this time, to avoid repeated data transmission, the server 400 may suspend the generation process of the voice control instruction and the reporting process of the voice control instruction in other smart devices 200 after receiving one voice control instruction. For example, after the smart television sends the voice control instruction to the server 400, the server 400 may send a control instruction for generating a pause instruction and sending the pause instruction to the smart speaker and the smart refrigerator in the smart home system where the smart television is located, and after receiving the control instruction, both the smart speaker and the smart refrigerator stop generating and sending the voice control instruction.

In some embodiments, in order to enable accurate control, the server 400 may further detect the voice control command of the same content in a specific detection period after receiving the voice control command sent by the smart device 200. After receiving a plurality of voice control commands with the same content but from different smart devices 200 in the detection period, the server 400 may determine whether the plurality of smart devices 200 belong to the same smart home system. If a plurality of smart devices 200 belong to the same smart home system, the server 400 may perform a subsequent operation for a plurality of voice control commands with the same content, and for a plurality of smart devices 200 belonging to different smart home systems, the server 400 needs to perform the subsequent operation for each voice control command.

Obviously, in order to realize the judgment of the voice control instruction, the smart device 200 may add the device identification information of the smart device 200 to the voice control instruction when generating the voice control instruction. The content that can be used as the device identification information may include a device name, a device model, a device type, a device network address, and the like of the smart device 200. The device identification information may be written in the voice control command by a specific encoding rule. The voice control command can also be written with a specific information code to replace the equipment identification information.

After receiving the voice control instruction transmitted by the smart device 200, the server 400 may extract the user intention information in the voice control instruction. The user intention information can be determined through a control action and a control object carried in the voice control instruction. For example, when the voice control instruction is "play movie", it may be determined that the current user intends to play video.

Since multiple voice control commands may correspond to the same user intent, multiple intent topics for characterizing the user intent may be built into the server 400 in order to reduce the amount of data matching. After receiving the voice control instruction, the server 400 may determine the user intention topic to which the current voice control instruction belongs according to the control action and the control object in the voice control instruction. For example, three user intention themes, i.e., video _ topic, music _ topic, and free _ control _ topic, may be stored in the server 400, and when the received voice control instruction content is "play movie", it may be determined that the user intention to which the current voice control instruction belongs is video _ topic. Similarly, when the received content of the voice control instruction is "play a tv series", it may also be determined that the user intention to which the current voice control instruction belongs is video _ topic. And when the received voice control instruction content is 'play music', the intention of the user to which the current voice control instruction belongs can be determined to be music _ topic.

The user intention information may also be obtained by processing through a speech processing model built into the server 400. That is, in some embodiments, after obtaining the voice control instruction, the server 400 may convert the voice audio data into the text information, and then input the text information into the text processing model, so as to obtain a recognition result output by the text processing model for the text information, that is, obtain the recognition result as a classification probability of the text information to the preset user intention tag. And extracting the user intention label with the highest classification probability in the processing result as user intention information. Wherein, the word processing model is also an artificial intelligence model based on machine learning.

As can be seen, in this embodiment, the data processing capability of the server 400 may be utilized to process the voice audio data corresponding to the voice control instruction, so as to adapt to a situation that the voice control instruction is the voice audio data directly transmitted by the intelligent device 200, and alleviate the operation load of the intelligent device 200. In addition, a dynamic monitoring mechanism can be established to monitor the operation load states of the intelligent device 200 and the server 400 in real time, so that the intelligent device 200 or the server 400 with good load states can be selected to execute word processing, and operation resources can be reasonably utilized. That is, if the operation load of the smart device 200 is light and the operation load of the server 400 is heavy, the smart device 200 performs word processing to generate a voice control command directly containing user intention information; on the other hand, if the operation load of the server 400 is light and the operation load of the smart device 200 is heavy, the smart device 200 directly transmits the voice audio data to the server 400, and the server 400 performs word processing to determine the user intention information.

After extracting the user intention information, the server 400 may match the smart device 200 capable of implementing the user intention information in the mapping database to generate a target device list. The mapping database may record a mapping relationship between the user intention information and the device capability information, that is, for a specific user intention, which smart devices 200 have device capabilities capable of realizing the user intention.

The device capability information may also set a plurality of different capability themes corresponding to the user intention information. For example, corresponding to a user intent of video _ topic, the smart device 200 capable of implementing the user intent should support device capabilities of video _ play; corresponding to the user intention of music _ topic, the smart device 200 capable of realizing the user intention should support the device capability of music _ play; corresponding to the user intent of the fridge _ control _ topic, the smart device 200 capable of achieving the user intent should support the device capabilities of the fridge _ control. And the same intelligent device 200 can support different capability themes according to the hardware condition. For example, a smart tv may support both video _ play and music _ play.

After obtaining the user intention information, the server 400 may match the device capabilities capable of satisfying the user intention in the mapping database, and then determine the smart device 200 capable of implementing the user intention through the support relationship between the device capabilities and the smart device 200.

In some embodiments, to generate the target device list, the server 400 may obtain the current network device list after extracting the user intention information. The network device table includes all the smart device information in the smart home system in which the smart device 200 is located. And traversing the device capability information required by the user intention information to obtain the target capability, and searching the intelligent device 200 with the target capability in the network device table to generate a target device list.

For example, for the content of the voice control instruction "i want to listen to music", it may be determined that the user intends to be "music _ topic", and it is determined that the device capability that the smart device 200 needs to support is "music _ play" corresponding to the matching, and through a matching process in the current smart home system, it may be determined that the smart device 200 having the device capability of "music _ play" includes a smart tv and a smart speaker, and then the server 400 may generate a target device list including the smart tv and the smart speaker.

After obtaining the target device list, the server 400 may further perform secondary screening on the target list according to the distance between the sound source position and the smart device 200, that is, the server 400 searches for an executing device in the target device list, where the executing device is the smart device 200 closest to the voice control instruction initiating position.

In order to detect the distance between the voice control instruction initiating position and the intelligent device 200, the intelligent devices 200 may report the respective collected user voice audio data to the server 400, and the server 400 may have a built-in determination module configured to determine a sound source position or a distance between the sound source position and each of the intelligent devices 200 according to the voice audio data reported by each of the intelligent devices 200.

In some embodiments, the server 400 may obtain the voice control command reported by each smart device 200 in the list of target devices and extract the acoustic energy value from the voice control command. By comparing the plurality of acoustic energy magnitudes, the smart device 200 with the highest acoustic energy magnitude is obtained, and the smart device 200 with the highest acoustic energy magnitude is thereby marked as an performing device.

In the specific sound source position detection process, calculation can be performed according to the arrival time difference of the sound source and the energy ratio of the direct sound to the reverberation sound. Since the reverberation time parameter T60 of the same room is determined, that is, the time required for the energy attenuation 60db at any position is the same, and T60 can be estimated based on the energy ratio of the direct sound to the reverberant sound at the corresponding position, the energy ratio of the direct sound to the reverberant sound of all the smart devices 200 in the environment with respect to the sound source can be found based on the spectrogram of the beam formation and the arrival time difference of the sound source, and further the direct energy can be found, and the direct sound energy of the sound source received in each device can be arranged, that is, the smart device 200 closest to the sound source position can be determined.

In some embodiments, the process of detecting the distance between the sound source position and the smart device 200 may also be completed by each smart device 200, that is, the smart device 200 may acquire images of the current environment through a multi-camera, construct a three-dimensional space model according to the images at multiple angles, and extract a portrait from the three-dimensional space model according to an image recognition method, so as to locate the position of the user in the three-dimensional space model, that is, the sound source position. After the sound source position is located, the smart device 200 determines the distance between the sound source position and each smart device 200 according to the placement state of the current smart home model, and finally sends the calculated distance to the server 400, so that the server 400 can determine the smart device 200 closest to the sound source position as an executing device.

In some embodiments, the detection of the distance between the sound source position and the smart device 200 may also realize the positioning of the sound source position according to the time difference of the multiple microphones collecting the same voice audio data. Since the propagation speed of the sound signal in the air is fixed and known, when multiple microphones detect the same voice audio data at different positions, the distance between the sound source position and the smart device 200 can be calculated according to the time difference between the detected sound start time and the positions between the smart devices 200. The above calculation process may be completed by the server 400 in a unified manner, that is, the plurality of intelligent devices 200 may report the respective detected voice audio data to the server 400, and the server 400 extracts the time when the voice is detected from each voice audio data, thereby calculating the distance between the sound source location and the intelligent device 200.

Since the number and functions of the smart devices 200 included in different smart home systems are different, the number of the smart devices 200 included in the target device list determined according to the matching result between the user intention and the device capability is different, and when the target device list includes different numbers of smart devices 200, the query process of the server 400 for the execution device may be different. That is, as shown in fig. 7, in some embodiments, the server 400 may obtain the number of devices in the target device list after generating the target device list.

If the number of devices is greater than 1, namely after the user intention and the device capability are screened, a plurality of intelligent devices 200 can be screened out to meet the current user intention. Therefore, the server 400 may perform secondary screening on the target device list based on the nearby decision principle in the manner provided in the above embodiment, that is, perform the step of searching for the executing device in the target device list.

If the number of devices is equal to 0, that is, after matching of the user intention and the device capability, it is determined that no smart device 200 meeting the user intention exists in the current smart home system, or the smart device 200 meeting the user intention is in a line processing state, at this time, the smart device 200 needs to make prompt feedback for voice audio data input by the user, so the server 400 may search for an execution device in the network device table, that is, query the smart device 200 closest to the sound source position in the current whole smart home system as the execution device, so as to prompt voice by the execution device feedback.

For example, when the user inputs the voice content "reduce the temperature in the refrigerator" and the user intent corresponding to the voice control command is fridge _ control _ topic, the server 400 may match the corresponding required device capability, namely fridge _ control, in the mapping database. However, by detecting the online states of all the smart devices 200 in the current smart home system, the smart device 200 with the fragment _ control device capability is in the off state, and there are no other smart devices 200 with the device capability, at this time, the server 400 cannot filter out the target device list, that is, the number of devices included in the target device list is 0. For such a situation, the server 400 may search the smart device 200 closest to the user location in the entire smart home system as an execution device, i.e., a display device, so as to play a feedback prompt voice through the display device, "i can not know what you are saying" or "the refrigerator is in the power-off state, please power on".

If the number of devices is equal to 1, that is, only one smart device 200 capable of supporting the current user's intention is included in the target device list, therefore, in order to reduce the data processing amount, the server 400 may skip the subsequent nearby decision-making manner and directly mark the smart device 200 in the target device list as the executing device. For example, when the user inputs the voice "i want to watch a movie", the server 400 matches to the smart device 200 with "video _ play" capability only with the smart tv according to the user intention, that is, only one smart device 200 is included in the target device list, so the server 400 can directly use the smart tv as an executing device for playing the voice "find the following movie for you".

After querying the obtaining execution device, the server 400 may issue the initially obtained voice control instruction to the execution device, so that the execution device may execute a voice response for the voice control instruction. For example, when the user inputs the voice "i want to listen to music", the server 400 may first filter out the list of target devices that support the user's intention, i.e., "smart tv", "smart speaker", according to the user's intention "music _ topic". And detecting the distance between the intelligent television and the intelligent sound box and the sound source position according to the distance detection process, and determining the intelligent television closest to the sound source position as an execution device, so that a voice control instruction is sent to the intelligent television, the intelligent television can respond to the voice control instruction to feed back a voice response that music is played for you, and a voice playing function is started to play music.

According to the technical scheme, when the smart home system includes a plurality of smart devices 200 capable of responding to the voice control instruction, the smart voice control method provided in the above embodiment may determine the target device list by screening of the user intention and the device capability, then screen the smart device 200 closest to the sound source position from the target device list as the execution device according to the nearby decision-making principle, and finally send the voice control instruction to the execution device, so that the execution device may respond to the voice control instruction, and other smart devices 200 may keep silent, thereby alleviating the problems of mutual interference and control confusion caused by simultaneous responses of a plurality of devices.

When the smart device 200 responds to a part of functions, the related applications need to be restarted, and a certain time is consumed for starting the related applications, so that the user needs to wait for a long time before the user can implement the related functions after inputting the voice control instruction. When the smart device 200 is in the appropriate playing state, the smart device 200 does not need to restart the relevant application, and directly makes a voice control response, so that the waiting time of the user is greatly reduced. Therefore, in order to improve the response speed of the voice interaction process, in some embodiments, the server 400 may detect the play status of each smart device 200 when searching for an executing device in the target device list according to the proximity decision rule, so as to determine that the smart device 200 is in the appropriate play status for preferentially responding to the voice interaction instruction of the user.

That is, in the step of searching for the execution device in the target device list, the server 400 may obtain the play status of each smart device in the target device list, set the execution weight for the smart device 200 according to the play status, and add the execution weight to the target device list.

The server 400 may set the execution weight for different user intentions or device capabilities, for example, when the device capability required by the user intention is "video _ play", the execution weight in the "ready" state is higher than the execution weight in the "power on" state, and thus, the execution weight of the smart device 200 in the "ready" state may be set to 1, and the execution weight in the "power on" state may be set to 0.5.

After adding the execution weight to the target device list, the server 400 may determine the execution device according to the execution weight and the sound source positioning result. That is, in the step of searching for the execution device in the target device list, the server 400 may first obtain the sound source positioning information reported by each intelligent device in the target device list, where the sound source positioning information includes a distance between the intelligent device and the voice control instruction initiating location. And if the target equipment list comprises the execution weight, correcting the execution weight according to the sound source positioning information, and marking the intelligent equipment 200 with the maximum execution weight as the execution equipment.

For example, the server 400 may set a correction coefficient according to a preset distance interval, where the correction coefficient is 1 in a distance interval of 0-2m, the correction coefficient is 0.8 in a distance interval of 2-4m, and the correction coefficient is 0.6 … … in a distance interval of 4-6m, so that, in conjunction with the play state, it may be determined that the device a in the "ready" state and in the distance interval of 2-4m corresponds to an execution weight of 0.8 × 1 ═ 0.8, and the device B in the "power on" state and in the distance interval of 0-2m corresponds to an execution weight of 1 × 0.5 ═ 0.5, that is, it is determined that the device a is an execution device.

It can be seen that, in the above embodiment, the manner of executing the device is determined based on the proximity decision rule and the weighting information, the weighting comparison can be performed on the basis of the original algorithm result, and the intelligent device 200 with the maximum value is taken as the optimal proximity wake-up to be used as the executing device, so that the proximity response and the response speed can be integrated, and the voice interaction experience of the user can be improved.

In the above embodiment, the first screening process of the server 400 for the smart devices 200 in the current smart home system is based on the matching result of the user intention and the device capability in the mapping database. Obviously, in order to improve the accuracy of the first screening process, when a new device in the smart home system is online, the server 400 needs to update the mapping database according to the device capability of the online device.

That is, as shown in fig. 8, in some embodiments, the server 400 may receive an online request sent by the smart device, and in response to the online request, obtain identification information of the smart device. And then, the device capability information supported by the smart device 200 is matched according to the identification information, so that the identification information of the smart device 200 and the device capability information supported by the smart device 200 are stored, and the mapping database is updated.

For example, when a new smart refrigerator accesses the current smart home system, the smart refrigerator needs to send an online request to the server 400. The online request may include identification information of the intelligent refrigerator, that is, the device model is "hxxx-BCD-xxx WTDVBPV", the server 400 searches the device capabilities supported by the intelligent refrigerator, that is, "video _ play", "music _ play", and "front _ control", according to the device model, and stores the found contents of the device capabilities, the device model, and the like in the mapping database, thereby updating the mapping database.

Based on the above intelligent voice control method, in some embodiments of the present application, a server 400 is also provided. As shown in fig. 9, the server 400 includes: a storage module 410, a communication module 420, and a control module 430. The storage module 410 is configured to store a mapping database, where the mapping database includes mapping relationships between user intention information and device capability information; the communication module 420 is configured to establish communication connections with a plurality of smart devices; the control module 430 is configured to perform the following program steps:

receiving a voice control instruction sent by intelligent equipment;

in response to the voice control instruction, extracting user intention information from the voice control instruction;

generating a target equipment list, wherein the target equipment list comprises intelligent equipment which is obtained by matching the mapping database and can realize the user intention information;

searching an executing device in the target device list, wherein the executing device is an intelligent device closest to the voice control instruction initiating position;

and sending the voice control instruction to the execution equipment to trigger the execution equipment to execute the voice response aiming at the voice control instruction.

In conjunction with the server 400 described above, a smart device 200 is also provided in some embodiments. As shown in fig. 10, the smart device 200 includes: an audio input device, an audio output device, a communicator 220, and a controller 250. Wherein the audio input device is configured to detect voice audio data input by a user; the audio output device is configured to play a voice response; communicator 220 is configured to establish a communication connection with server 400; the controller 250 is configured to perform the following program steps:

acquiring voice audio data input by a user;

generating a voice control instruction according to the voice audio data;

sending a voice control instruction to a server so that the server extracts user intention information from the voice control instruction and matches intelligent equipment capable of realizing the user intention information in a mapping database to generate a target equipment list; searching the execution equipment in the target equipment list and sending a voice control instruction;

and executing voice response aiming at the voice control instruction.

According to the technical scheme, the server 400 and the intelligent device 200 provided by the embodiment can form an intelligent home system. The smart home system may send the voice control instruction to the server 400 after the user inputs the voice control instruction, extract the user intention information in the voice control instruction by the server 400, and match the smart devices capable of realizing the user intention information in the mapping database to generate the target device list. And searching the executing equipment which is closest to the user or has the highest executing weight value in the target equipment list, and issuing a voice control instruction to the executing equipment so as to respond to the voice control instruction input by the user.

The method can determine the execution equipment responding to the voice control instruction of the user based on the user intention matching result and the sound source positioning result, so that when a plurality of intelligent equipment 200 capable of responding to the voice control instruction are included in the intelligent home system, only the execution equipment responds, and the problems of control confusion and the problem that only the equipment responds but feedback cannot support control service are solved.

The embodiments provided in the present application are only a few examples of the general concept of the present application, and do not limit the scope of the present application. Any other embodiments extended according to the scheme of the present application without inventive efforts will be within the scope of protection of the present application for a person skilled in the art.

Claims

1. A server, comprising:

the storage module is configured to store a mapping database, and the mapping database comprises mapping relation between user intention information and equipment capability information;

a communication module configured to establish communication connections with a plurality of smart devices;

a control module configured to:

receiving a voice control instruction sent by intelligent equipment;

2. The server of claim 1, wherein the voice control instruction comprises voice audio data input by a user, and wherein the control module is further configured to:

in the step of extracting user intention information from the voice control instruction, converting the voice audio data into character information;

inputting the text information into a text processing model, wherein the text processing model is an artificial intelligence model based on machine learning;

acquiring an identification result output by the word processing model aiming at the word information, wherein the identification result is the classification probability of the word information to a preset user intention label;

and extracting the user intention label with the highest classification probability in the processing result as the user intention information.

3. The server of claim 1, wherein the control module is further configured to:

receiving an online request sent by the intelligent equipment;

responding to the online request, and acquiring identification information of the intelligent equipment;

matching the equipment capability information supported by the intelligent equipment according to the identification information;

and storing the identification information of the intelligent equipment and the equipment capability information supported by the intelligent equipment.

4. The server of claim 1, wherein the control module is further configured to:

in the step of generating the target equipment list, acquiring a current network equipment list, wherein the network equipment list comprises all intelligent equipment in an intelligent home system where the intelligent equipment is located;

traversing the equipment capability information required for supporting the user intention information to obtain target capability;

and searching the intelligent device with the target capability in the network device table to generate the target device list.

5. The server of claim 4, wherein the control module is further configured to:

after the step of generating the target device list, acquiring the number of devices in the target device list;

if the number of devices is equal to 0, looking up the executing device in the network device table;

if the number of the devices is equal to 1, marking the intelligent devices in the target device list as the execution devices;

and if the number of the devices is more than 1, searching the executing device in the target device list.

6. The server of claim 1, wherein the control module is further configured to:

in the step of searching for the execution device in the target device list, acquiring a voice control instruction reported by each intelligent device in the target device list;

extracting an acoustic energy value from the voice control instruction;

comparing the acoustic energy magnitude values to obtain a smart device with the highest acoustic energy magnitude value;

and marking the intelligent device with the highest acoustic energy value as the executive device.

7. The server of claim 1, wherein the control module is further configured to:

in the step of searching for the execution device in the target device list, acquiring the playing state of each intelligent device in the target device list;

setting an execution weight for the intelligent equipment according to the playing state;

adding the execution weight to the target device list.

8. The server of claim 7, wherein the control module is further configured to:

in the step of searching for the execution device in the target device list, sound source positioning information reported by each intelligent device in the target device list is acquired, wherein the sound source positioning information comprises the distance between the intelligent device and the voice control instruction initiating position;

if the target equipment list comprises the execution weight, correcting the execution weight according to the sound source positioning information;

and marking the intelligent equipment with the maximum execution weight value as the execution equipment.

9. A smart device, comprising:

an audio input device configured to detect voice audio data input by a user;

an audio output device configured to play a voice response;

a communicator configured to establish a communication connection with a server;

a controller configured to:

acquiring voice audio data input by a user;

generating a voice control instruction according to the voice audio data;

and executing voice response aiming at the voice control instruction.

10. An intelligent voice control method is characterized by being applied to an intelligent home system, wherein the intelligent home system comprises a server and intelligent equipment, and communication connection is established between the server and the intelligent equipment; the intelligent voice control method comprises the following steps:

the intelligent equipment sends the voice control instruction to the server;