CN111427444B

CN111427444B - Control method and device of intelligent device

Info

Publication number: CN111427444B
Application number: CN201811583306.9A
Authority: CN
Inventors: 杨一帆; 刘峥强; 孟越涛; 罗红
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2022-05-10
Anticipated expiration: 2038-12-24
Also published as: CN111427444A

Abstract

The invention discloses a control method and equipment of intelligent equipment, which are used for solving the problem that the existing intelligent equipment cannot determine the accurate intention of a user for the vague expression of the user, so that the human-computer interaction experience is poor. After the network side equipment acquires the voice demand information of the user, if the skill domain corresponding to the voice demand information of the user cannot be determined, the inquiry information of the skill domain corresponding to the voice demand information is played to the user through the intelligent equipment; and after the skill domain required by the user is determined, executing the operation corresponding to the voice requirement information of the user through the intelligent equipment according to the determined skill domain. The network side equipment initiates an inquiry under the condition that the user intention is not clear, so that the user can select a required technical domain, and the accuracy of the network side equipment in determining the user intention is improved.

Description

Control method and device of intelligent device

Technical Field

The invention relates to the technical field of internet, in particular to a control method and control equipment of intelligent equipment.

Background

With the rapid development of science and technology in recent years, intelligent equipment such as bamboo shoots in spring after raining appears endlessly and is flooded in the aspect of life. Among them, smart devices aimed at creating higher-quality family life have entered an outbreak period. For example, smart devices such as smart speakers or smart watches have voice recognition and internet access functions.

During actual use, a user can perform conversation to an intelligent device with a voice recognition function, such as an intelligent sound box, send a control instruction to instruct the intelligent sound box to inquire weather or play music, and the user can also chat with the intelligent sound box or send a conversation such as an inquiry instruction to the intelligent sound box.

At present, the smart sound boxes on the market only make guesses about the user intention, when the user may have multiple meanings, the system cannot determine the real intention of the user, and only randomly recommends a reply corresponding to one meaning, so that the hit rate of the system for the user intention is low, the user often feels that the smart sound box answers questions, but the user does not know how to express the required skill to trigger.

In summary, the existing intelligent device cannot determine the accurate intention of the user for the ambiguous expression of the user, which results in poor human-computer interaction experience.

Disclosure of Invention

The invention provides a control method and equipment of intelligent equipment, which are used for solving the problem that the existing intelligent equipment cannot determine the accurate intention of a user for the vague expression of the user, so that the human-computer interaction experience is poor.

The method comprises the following steps:

in a first aspect, a method for controlling an intelligent device provided in an embodiment of the present invention includes:

the network side equipment acquires the voice demand information of a user through intelligent equipment;

the network side equipment plays inquiry information of a technical domain corresponding to the voice demand information to a user through intelligent equipment;

and the network side equipment determines the skill domain corresponding to the played inquiry information as the skill domain required by the user according to the voice response information of the user acquired by the intelligent equipment, and then executes the operation corresponding to the voice demand information through the intelligent equipment according to the determined skill domain.

According to the method, after the network side equipment acquires the voice demand information of the user, if the skill domain corresponding to the voice demand information of the user cannot be determined, the inquiry information of the skill domain corresponding to the voice demand information is played to the user through the intelligent equipment; and after the skill domain required by the user is determined, executing the operation corresponding to the voice requirement information of the user through the intelligent equipment according to the determined skill domain. The network side equipment initiates an inquiry under the condition that the intention of the user is not clear, so that the user can select a required technical domain, and the accuracy of determining the intention of the user by the network side equipment is improved.

In an optional implementation manner, after the network-side device collects voice demand information of a user through an intelligent device, and before the query information of a technical domain corresponding to the voice demand information is played to the user through the intelligent device, the method further includes:

the network side equipment determines an alternative technical domain according to the voice demand information;

if the network side equipment determines an alternative skill domain, taking the alternative skill domain as a skill domain corresponding to the voice demand information; or

And if the network side equipment determines a plurality of alternative skill domains, the network side equipment takes the alternative skill domains meeting the inquiry conditions as the skill domains corresponding to the voice demand information.

According to the method, if the network side equipment determines that the voice demand information has a plurality of candidate skill domains, the candidate skill domains meeting the inquiry conditions can be used as the skill domains corresponding to the voice demand information, and the candidate skill domains not meeting the inquiry conditions are not inquired to the user through the intelligent equipment, so that the invalid inquiry times are reduced, the complexity of human-computer interaction is simplified, and the hit rate of the skill domains is improved.

In an optional implementation manner, the network-side device determines the candidate technology domain that satisfies the query condition by:

the network side equipment selects the first N skill domains from the multiple alternative skill domains according to the association degree of the voice demand information and the alternative skill domains;

if a plurality of association degrees not less than a first threshold exist in the association degrees of the first N candidate skill domains and the voice demand information, taking the candidate skill domains corresponding to the plurality of association degrees not less than the first threshold as candidate skill domains meeting the inquiry condition; or

If the correlation degrees of the first N candidate skill domains and the voice demand information are not less than the correlation degree of the first threshold, taking the candidate skill domain and the target candidate skill domain corresponding to the maximum correlation degree as candidate skill domains meeting the query condition, wherein the target candidate skill domain is the candidate skill domain of which the difference between the maximum correlation degree and the correlation degree of the target candidate skill domain is not more than the second threshold;

wherein N is a positive integer greater than 1.

According to the method, for the ambiguous expression of the user, the situation that the association degree of the voice demand information and the alternative skill domain is close may exist, and the network side equipment determines to perform multiple rounds of inquiry on the skill domain corresponding to the close association degree, so that the skill domain required by the user is avoided being omitted.

In an optional implementation manner, if 1 of the association degrees of the first N candidate skill domains and the voice demand information is not less than a first threshold, the network side device executes, by using an intelligent device, an operation corresponding to the voice demand information according to the skill domain corresponding to the association degree; or

And if the association degrees of the first N alternative skill domains and the voice demand information are not less than the association degree of the first threshold value and no target alternative skill domain exists, the network side equipment executes the operation corresponding to the voice demand information through intelligent equipment according to the alternative skill domain corresponding to the maximum association degree.

According to the method, the network side equipment can determine the alternative skill domain with clear intention corresponding to the voice demand information of the user, the network side equipment directly executes the operation corresponding to the voice demand information through the intelligent equipment according to the determined alternative skill domain, the user is not inquired through the intelligent sound box, the man-machine interaction times can be reduced, the system interaction is simplified, the intelligence of the intelligent equipment is improved, and therefore the use experience of the user is also improved.

In an optional implementation manner, the network side device determines the association degree between the alternative skill domain and the voice requirement information by the following means:

for any one alternative skill domain, the network side equipment determines the association degree of the alternative skill domain and the public domain of the voice demand information through a first training model, and determines the association degree of the alternative skill domain and the private domain of the voice demand information through a second training model;

and the network side equipment determines the association degree of the alternative skill domain and the voice demand information according to the distribution weight of the association degree of the public domain and the association degree of the private domain.

According to the method, the network side equipment can identify the personalized intention according to the use habits of the user, so that the personalized configuration of the user is formed, and the hit efficiency and accuracy of the technical domain are improved.

In an optional implementation manner, the network-side device updates sample data of the first training model according to information crawled from a network; and/or

And the network side equipment updates the sample data of the second training model according to the historical voice demand information and the corresponding skill domain.

According to the method, the network side equipment can determine the intention of the user in vague expression according to the historical record of the user, the hit efficiency and accuracy of the technical domain are improved, and the use of the user is more convenient.

In an optional implementation manner, after the network-side device plays, to a user, inquiry information of a technical domain corresponding to the voice demand information through an intelligent device, the method further includes:

the network side equipment judges whether the skill domain required by the user can be determined or not according to the voice response information;

if the network side equipment determines that the skill domain corresponding to the played inquiry information is not the skill domain corresponding to the user requirement, judging whether an unused skill domain exists in the skill domain corresponding to the voice requirement information;

if yes, selecting one skill domain from the unused skill domains, playing inquiry information of the skill domain corresponding to the voice demand information to the user through the intelligent equipment, and returning to the step of judging whether the skill domain required by the user can be determined;

otherwise, stopping playing the inquiry information of the technical domain corresponding to the voice demand information to the user through the intelligent equipment.

According to the method, the network side equipment initiates the inquiry under the condition that the intention of the user is not clear, so that the user can select the required technical domain, and the accuracy of determining the intention of the user by the network side equipment is improved.

In a second aspect, an embodiment of the present invention further provides a network side device for controlling an intelligent device, including: a processor and a memory:

the processor is configured to: acquiring voice demand information of a user through intelligent equipment; the inquiry information of the technical domain corresponding to the voice demand information is played to the user through intelligent equipment; and according to the voice response information of the user acquired by the intelligent equipment, after determining the skill domain corresponding to the played inquiry information as the skill domain required by the user, executing the operation corresponding to the voice demand information through the intelligent equipment according to the determined skill domain.

In one possible implementation, the processor is further configured to:

after voice demand information of a user is collected through intelligent equipment, determining an alternative technical domain according to the voice demand information; if a candidate skill domain is determined, taking the candidate skill domain as a skill domain corresponding to the voice demand information; or if a plurality of candidate skill domains are determined, the network side device takes the candidate skill domain meeting the inquiry condition as the skill domain corresponding to the voice demand information.

In one possible implementation, the processor determines the candidate skill domains that satisfy the query condition by:

selecting the first N skill domains from the plurality of candidate skill domains according to the association degree of the voice demand information and the candidate skill domains; if a plurality of association degrees not less than a first threshold exist in the association degrees of the first N candidate skill domains and the voice demand information, taking the candidate skill domains corresponding to the plurality of association degrees not less than the first threshold as candidate skill domains meeting the inquiry condition; or if the correlation degrees of the first N candidate skill domains and the voice demand information are not less than the correlation degree of the first threshold, taking the candidate skill domain and the target candidate skill domain corresponding to the maximum correlation degree as candidate skill domains meeting the query condition, wherein the target candidate skill domain is a candidate skill domain whose difference between the maximum correlation degree and the correlation degree of the target candidate skill domain is not more than the second threshold; wherein N is a positive integer greater than 1.

In one possible implementation, the processor is further configured to:

if 1 of the association degrees of the first N candidate skill domains and the voice demand information is not less than the association degree of the first threshold, executing the operation corresponding to the voice demand information through intelligent equipment according to the skill domain corresponding to the association degree; or if the association degrees of the first N candidate skill domains and the voice demand information are not less than the association degree of the first threshold value and no target candidate skill domain exists, executing the operation corresponding to the voice demand information through intelligent equipment according to the candidate skill domain corresponding to the maximum association degree.

In one possible implementation manner, the processor determines the association degree of the alternative skill domain and the voice requirement information by the following means:

for any one alternative skill domain, determining the association degree of the alternative skill domain and the public domain of the voice demand information through a first training model, and determining the association degree of the alternative skill domain and the private domain of the voice demand information through a second training model; and determining the association degree of the alternative skill domain and the voice demand information according to the distribution weight of the association degree of the public domain and the association degree of the private domain.

In one possible implementation, the processor is further configured to:

updating sample data of the first training model according to information crawled from a network; and/or updating the sample data of the second training model according to the historical voice demand information and the corresponding skill domain.

In one possible implementation, the processor is further configured to:

judging whether the skill domain required by the user can be determined or not according to the voice response information;

if the skill domain corresponding to the played inquiry information is determined not to be the skill domain corresponding to the user requirement, judging whether unused skill domains exist in the skill domain corresponding to the voice requirement information;

if yes, selecting one skill domain from the unused skill domains, playing inquiry information of the skill domain corresponding to the voice demand information to the user through the intelligent equipment, and returning to the step of judging whether the skill domain required by the user can be determined; otherwise, stopping playing the inquiry information of the technical domain corresponding to the voice demand information to the user through the intelligent equipment.

In a third aspect, an embodiment of the present invention further provides a network side device for controlling an intelligent device, where the network side device includes:

at least one processing unit and at least one memory unit, wherein the memory unit has stored program code which, when executed by the processing unit, causes the processing unit to perform the functions of the embodiments of the first aspect described above.

In a fourth aspect, the present application also provides a computer storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the method of the first aspect.

In addition, for technical effects brought by any one implementation manner of the second aspect to the fourth aspect, reference may be made to technical effects brought by different implementation manners of the first aspect, and details are not described here.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic diagram of an application scenario provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a system for controlling smart devices according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for determining, by a network device, an alternative technology domain that satisfies an inquiry condition according to an embodiment of the present invention;

fig. 4 is a schematic view of an application scenario in which the smart sound box determines a required skill domain of a user through multiple rounds of queries according to the embodiment of the present invention;

fig. 5 is a schematic structural diagram of a first network-side device for controlling an intelligent device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a network-side device for controlling an intelligent device according to a second embodiment of the present invention;

fig. 7 is a schematic structural diagram of a third network-side device for controlling an intelligent device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Some of the words that appear in the text are explained below:

(1) "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

(2) The intelligent equipment in the embodiment of the invention refers to an intelligent terminal capable of carrying out biological identification, such as an intelligent sound box, a mobile phone and the like.

(3) The intelligent sound box is an upgrading product of a common sound box, is a tool for household consumers to surf the internet by voice, such as song ordering, internet shopping or weather forecast knowing, and can also control intelligent household equipment, such as opening a curtain, setting the temperature of a refrigerator, heating a water heater in advance and the like.

(4) The "network side device" in the embodiments of the present invention refers to a device that can communicate with an intelligent device and provide services such as data processing, data storage, and decision-making for the intelligent device, for example, a cloud server (hereinafter, referred to as a cloud).

(5) The skill domain refers to a set summarized according to data types, such as a song skill domain, a phase-sound skill domain, a novel skill domain, a medical skill domain, a weather skill domain, and the like.

(6) The "NLP" referred to in the embodiments of the present invention is a way for a computer to analyze, understand and derive meaning from human language in a clever and useful way, i.e., a technology that enables a computer to understand human language. By utilizing NLP, developers can organize and build knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, emotion analysis, speech recognition, and topic segmentation. In embodiments of the present invention, NLP is used to process text, classify, label, lexical analyze, label, parse, and the like.

The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems. In the description of the present invention, the term "plurality" means two or more unless otherwise specified.

With the rapid development of science and technology in recent years, intelligent equipment such as bamboo shoots in spring after raining appears endlessly and is flooded in the aspect of life. Among them, smart home devices aimed at creating higher quality home life and living environments have entered an outbreak period. For example, the smart speaker or the mobile phone has a voice recognition internet access function.

During practical use, a user can talk to a smart device with a voice recognition function, such as a smart speaker, send a control instruction to instruct the smart speaker to inquire weather or play music, and the user can also chat with the smart speaker or send a consultation instruction to the smart speaker, such as "how to walk in happy street? "wait for the question.

The intelligent sound box and the cloud are in a networking state, the intelligent sound box sends collected voice information of the user to the cloud, and the cloud searches corresponding content according to the voice information of the user and then executes corresponding operation, for example, a navigation map about a happy street is displayed for the user.

In the scenario shown in fig. 1, the smart speakers currently on the market, simply guessing when determining the user's intention, for example, the user consulting the "i want to hear the three countries' speech" smart speaker cannot determine whether the user wants to hear the three countries 'speech or the three countries' speech. When the user says that the words can have multiple meanings, the system only recommends a search result at random, so that the user often feels that the smart sound box gives a question, but the user does not know how to express the words to trigger the required skills.

Therefore, the present invention provides a method for controlling an intelligent device, wherein after a network-side device acquires voice demand information sent by a user through an intelligent device, such as an intelligent sound box, the network-side device determines a corresponding skill domain according to the voice demand information of the user, wherein the number of the skill domains may be one or more, and when the network-side device cannot determine the intention of the user, the network-side device plays inquiry information of the corresponding skill domain determined according to the voice demand information of the user to the user through the intelligent sound box, such as "do you consult in a drama field? ". After receiving the query information of the smart sound box, the user expresses positive feedback or negative feedback to the smart sound box, such as positive sentences or moods such as 'yes' and 'ok' and negative sentences or moods such as 'don't want 'and' don't care' in the negative feedback. If the response information fed back by the user is positive feedback, the network side equipment executes the operation corresponding to the voice demand information of the user through the intelligent sound box according to the determined skill domain after determining that the skill domain corresponding to the played inquiry information is the skill domain required by the user according to the voice response information of the user collected by the intelligent sound box.

The embodiments of the present invention will be described in further detail with reference to the drawings attached hereto.

As shown in fig. 2, an embodiment of the present invention provides a control system for an intelligent device, where the system includes: network-side device 10 and smart device 20:

the network side equipment 10 is used for acquiring voice demand information of a user through intelligent equipment; the inquiry information of the technical domain corresponding to the voice demand information is played to the user through intelligent equipment; and according to the voice response information of the user acquired by the intelligent equipment, after determining the skill domain corresponding to the played inquiry information as the skill domain required by the user, executing the operation corresponding to the voice demand information through the intelligent equipment according to the determined skill domain.

The intelligent device 20 is used for sending the collected voice demand and voice response information of the user to the network side device; and playing the received inquiry information of the technical domain corresponding to the voice demand information sent by the network side equipment to the user.

According to the scheme, after the network side equipment acquires the voice demand information of the user, if the skill domain corresponding to the voice demand information of the user cannot be determined, the inquiry information of the skill domain corresponding to the voice demand information is played to the user through the intelligent equipment; and after the skill domain required by the user is determined, executing the operation corresponding to the voice requirement information of the user through the intelligent equipment according to the determined skill domain. The network side equipment initiates an inquiry under the condition that the intention of the user is not clear, so that the user can select a required technical domain, and meanwhile, the accuracy of the network side equipment in determining the intention of the user is improved.

The following describes in detail a method for controlling an intelligent device by a network side device in the embodiment of the present application, taking an intelligent sound box and a cloud as examples:

in the technical scheme of the embodiment of the invention, the voice demand information of the user is acquired by the intelligent sound box, the intelligent equipment can convert the acquired voice demand information into characters by an ASR (Automatic Speech Recognition) technology, and can also upload the converted text information to the cloud by a TTS (text to Speech) technology.

After receiving the voice demand information of the user sent by the intelligent sound box, the cloud end determines the alternative skill domain of the voice demand information, and determines the skill domain corresponding to the voice demand information from the alternative skill domain.

The cloud end comprises a plurality of technical domains, such as weather, maps, music, commentary, voices, radio stations, drama, books, speeches and the like.

When the cloud determines the alternative technical domain according to the voice demand information of the user, there are many possibilities, and the following description is given in different cases:

the first condition is as follows: the cloud determines an alternative technology domain;

for example, after the user inquires about 'how the temperature is in the next day' from the intelligent sound box, the cloud acquires the voice demand information 'how the temperature is in the next day' of the user through the intelligent sound box, obtains an alternative skill domain- 'weather' from the voice demand information of the user through the training model, and then the terminal takes the alternative skill domain 'weather' as the voice demand information of the user to obtain a corresponding skill domain, and executes the operation corresponding to the voice demand information through the intelligent equipment according to the skill domain 'weather'. For example, weather conditions on tomorrow or in the last week are played to the user.

In a possible mode, after the cloud end uses the weather of the technical domain as the voice demand information of the user to obtain the corresponding technical domain, the cloud end plays inquiry information of the technical domain corresponding to the voice demand information to the user through the intelligent sound box; such as "do you ask weather". And the cloud end executes the operation corresponding to the voice demand information through the intelligent equipment according to the determined skill domain after determining that the skill domain corresponding to the played inquiry information is the skill domain required by the user.

Case two: the cloud determines a plurality of alternative technology domains;

for example, a user initiates a conversation "i want to hear the three kingdoms speech," the cloud receives the voice demand information of the user "i want to hear the three kingdoms speech" collected by the smart speaker, and the cloud determines that a plurality of alternative skill domains of the voice demand information of the user exist, for example, the cloud determines that the alternative skill domains according to "i want to hear the three kingdoms speech" include: "drama", "comment" and "make a sound".

For the case of multiple candidate technology domains, an optional manner is that the cloud may send all the candidate technology domains to the user in sequence through the smart sound box for inquiry, but the time consumption is long, and more polling queries may be required for the user to hit. In another optional mode, the cloud may also select, according to the association degree between the voice demand information of the user and the alternative technology domain, the alternative technology domain that meets the inquiry condition as the technology domain corresponding to the voice demand information (that is, the technology domain corresponding to the inquiry information may be played to the user through the smart device), so as to narrow the range of the alternative technology domain.

Further, as shown in the method flowchart diagram shown in fig. 3, the cloud determines the alternative skill domains that satisfy the query condition by the method shown in fig. 3:

step 300, the cloud determines that a plurality of alternative technical domains of the language demand information exist;

301, the cloud selects the first N skill domains from the multiple candidate skill domains according to the association degree between the voice demand information and the candidate skill domains;

step 302, the cloud judges whether the previous N candidate skill domains have skill domains larger than a first threshold value; if yes, go to step 303, otherwise go to step 305;

step 303, the cloud judges whether the skill domain which is larger than the first threshold value in the first N candidate skill domains is one, if so, the step 304 is executed, otherwise, the step 307 is executed;

step 304, the cloud determines that a plurality of alternative skill domains corresponding to the association degree not less than the first threshold meet the inquiry condition;

step 305, the cloud judges whether a skill domain with the maximum correlation difference value not greater than a second threshold exists in the candidate skill domains;

step 306, the cloud determines that the candidate skill domain corresponding to the maximum relevance and the target candidate skill domain meet the inquiry condition;

and 307, the cloud executes the operation corresponding to the voice demand information through the intelligent equipment according to the alternative technical domain.

For example, the cloud selects the top 2 skill domains from the candidate skill domains. In combination with the above example, if the cloud determines that the alternative skill domains of the voice demand information of the user, "i want to hear the three kingdoms performance" are "drama", "comment" and "make a glance", and the association degrees of the voice demand information of the user and each alternative skill domain are respectively: the correlation degree of "drama" is 0.9; the degree of association of "comment" was 0.8, and the degree of association of "voiced" was 0.5.

And selecting the first 2 skill domains as the drama and the comment according to the relevance degree of each candidate skill domain.

The cloud end judges whether the association degree between the voice demand information of the user and the top N (N is a positive integer greater than 1) candidate skill domains is not less than a first threshold value. For example, in combination with the above example, if the first threshold is 0.9, the cloud determines that "drama" is a skill domain not smaller than the first threshold.

The cloud end can play inquiry information of the alternative skill domains to a user through an intelligent sound box, and after the skill domain required by the user is determined, the operation corresponding to the voice demand information is executed through intelligent equipment according to the alternative skill domains. Such as: the cloud plays a three-country demonstration of a drama which you want to listen to a user through the intelligent sound box? The cloud end receives the positive voice response information of the user and plays the drama three-country performance to the user through the intelligent sound box. In another optional mode, the cloud determines that the intention of the voice demand information of the user is clear directly according to the determined alternative technology domain which is not smaller than the first threshold, and the operation corresponding to the voice demand information is executed directly through the intelligent device without inquiring the user.

In a second possible manner, the cloud determines that a plurality of candidate skill domains not smaller than the first threshold are present in the first N candidate skill domains, and then the candidate skill domains corresponding to the association degrees not smaller than the first threshold are used as the candidate skill domains meeting the query condition. The cloud end can initiate multiple rounds of inquiry, and the technical domain required by the user is determined by playing inquiry information of the technical domain corresponding to the voice demand information to the user. When the cloud end plays the inquiry information through the intelligent sound box, one of the alternative skill domains meeting the inquiry condition can be randomly selected, and the alternative skill domain meeting the inquiry condition can also be selected according to the relevance of the alternative skill domain.

If the cloud determines that the skill domain corresponding to the played inquiry information is not the skill domain corresponding to the user requirement according to the voice response information of the user, judging whether an unused skill domain exists in the skill domain corresponding to the voice requirement information; if yes, selecting one skill domain from the unused skill domains, playing inquiry information of the skill domain corresponding to the voice demand information to the user through the intelligent equipment, and returning to the step of judging whether the skill domain required by the user can be determined; otherwise, stopping playing the inquiry information of the technical domain corresponding to the voice demand information to the user through the intelligent equipment.

For example, in the scenario shown in fig. 4, if the cloud determines that the first two candidate skill areas are "drama" and "phase sound", respectively, the association degree of drama "is 0.9, and the association degree of phase sound" is 0.95, the cloud first selects "phase sound" according to the association degree to inquire, and the cloud plays "is what phase sound you want to hear" to the user through the smart speaker? If the user feeds back yes, the cloud end plays the three-country meaning of the voice to the user through the intelligent sound box; if the user feeds back "no", the cloud judges whether an unused skill domain (drama) exists in the skill domain corresponding to the voice demand information, and the cloud plays "do you want to hear the drama" to the user again through the smart speaker? If the user feeds back yes, the cloud end plays the drama three-country meaning to the user through the intelligent sound box; if the user feeds back 'no', the cloud judges whether an unused skill domain exists in the skill domain corresponding to the voice demand information, and if the cloud determines that the unused skill domain does not exist, the cloud stops playing inquiry information of the skill domain corresponding to the voice demand information to the user through the intelligent device, or prompts the user that 'I do not know what you say'.

The cloud determines that there is no association degree not less than a first threshold in the association degrees of the first N candidate skill domains and the voice demand information, that is, the association degrees of the first N candidate skill domains and the voice demand information are all less than the first threshold. The cloud determines the target candidate skill domain by comparing the maximum relevance degree in the first N candidate skill domains with the relevance degrees of other skill domains in the first N candidate skill domains, and takes the candidate skill domain corresponding to the maximum relevance degree and the target candidate skill domain as candidate skill domains meeting the inquiry condition.

The cloud determines the target alternative technology domain by the following means:

and if the difference value between the maximum relevance degree in the first N candidate skill domains and the relevance degrees of other skill domains in the first N candidate skill domains is not larger than a second threshold value, the cloud end determines that the other skill domains are the target candidate skill domains.

Such as: assuming that the second threshold is 0.15, the cloud determines that the first two candidate skill domains are "song" and "drama" respectively, the association degree of the "song" is 0.6, the association degree of the "drama" is 0.7, the cloud determines that the difference between the association degree of the "song" 0.6 and the maximum association degree of the "song" is 0.1 and is not greater than 0.15, the cloud determines that the "song" is the target candidate skill domain, and the cloud determines that the target candidate skill domain (song) and the candidate skill domain (drama) corresponding to the maximum association degree meet the query condition.

After determining that the candidate technology domains satisfying the query condition include a plurality of technology domains, the cloud end execution step may be according to a specific operation step in a second possible manner, which is not described herein again.

And fourthly, the cloud determines that no association degree which is not smaller than a first threshold value exists in the association degrees of the first N candidate skill domains and the voice demand information, and determines that no target candidate skill domain exists, that is, the cloud determines that the association degrees of the first N candidate skill domains are all smaller than the first threshold value, and the difference value between the maximum association degree in the first N candidate skill domains and the association degrees of the other candidate skill domains is larger than a second threshold value, and the network side device executes the operation corresponding to the voice demand information through the intelligent device according to the candidate skill domain corresponding to the maximum association degree.

For example, assuming that the second threshold is 0.15, the cloud determines that the first two candidate skill domains are "song" and "drama", respectively, the association degree of the song "is 0.1, the association degree of the drama" is 0.7, the cloud determines that the maximum association degree is 0.7, and the difference between the association degree of the song "and the association degree of the maximum association degree is 0.6 and is greater than 0.15, the cloud determines that there is no target candidate skill domain, and the cloud executes the operation corresponding to the voice demand information through the intelligent device according to the candidate skill domain (drama) corresponding to the maximum association degree, or refers to a specific implementation manner of the cloud in the first possibility, which is not described herein again.

According to the embodiment of the invention, after acquiring the text information of the voice demand information of the user uploaded by the intelligent sound box, the cloud carries out parsing through NLP (Natural Language Processing), carries out word segmentation on the text information of the voice demand information of the user, inputs a word segmentation result into a first training model of a public domain and a second training model of a private domain, and the first training model and the second training model analyze the input information into structural data which can be recognized by a machine, and determines an alternative skill domain corresponding to the voice demand information and the association degree of the voice demand information and the alternative skill domain.

The word segmentation processing of NLP is explained by way of example:

the NLP is a study on how to make a computer read human language, for example, in daily life, some uncommon words which are not known how to read are always encountered, and then the NLP often goes to a search engine to search, for example, "4 read something. The search result is a matching result of showing you the words "" instead of "4 and composed words" or their surface, which is to read the human language by NLP to understand the real intention of the user.

For the embodiment of the invention, for example, "i want to hear the three-country speech," NLP "can determine that" hear "is a verb and" the three-country speech "is a noun for the text information" i want to hear the three-country speech, "NLP can determine that" hear "is a verb and that" the three-country speech "is a noun, and the user intends to expect to play the relevant audio of" the three-country speech, "and then the cloud inputs the NLP segmentation result" the three-country speech "into the first training model and the second training model, and determines the candidate skill domain of" the three-country speech "by comparing with the stored sample data.

If the sample data stored in the first training model is that the 'classic old song' three-country rehearsal 'corresponds to the' song 'technical domain, the association degree of the two is 0.8, the first training model determines that the association degree of the' three-country rehearsal 'and the' classic old song 'three-country rehearsal' is 0.8 through comparison, and finally determines that the association degree of the 'three-country rehearsal' and the 'song' is 0.8-0.64.

The technical scheme provided by the embodiment of the invention relates to a public domain and a private domain for introduction and explanation:

1) a public domain.

The public domain faces to information sets of all users, and the information of the public domain is shared by any user which can be connected to the cloud end through a third-party medium.

The third-party medium can be understood as an entity or virtual device or medium capable of communicating with the cloud.

For example, the user connects to the cloud through a smart speaker.

The cloud updates the sample data of the first training module in the public domain through crawling the network information. Specifically, the cloud acquires information by crawling each large portal website, updates the vocabulary or sentences contained in the acquired information to the skill domain corresponding to the vocabulary or sentences in the public domain, and updates the threshold of the vocabulary or sentences in the skill domain according to timeliness of the information.

According to a possible mode, the cloud crawls news of all large portal websites through a crawler technology, a first training model can improve the association degree of the latest information crawled at this time and a corresponding technical domain, and when the cloud acquires information about the technical domain from a user through a smart sound box next time, the cloud can firstly recommend the information with the highest association degree of the technical domain to the user.

For example: when the cloud updates the sample information of the first training model in the public domain, the cloud acquires the current information that the company A issues a new mobile phone with the model of product A, the cloud adds the product A into the technical domain of electronic products, and the association degree between the product A and the electronic products is determined to be 0.9 through the first training model. The cloud end uses the information as sample data of a first training model in the public domain.

When the intelligent sound box acquires that the voice demand information of the user is 'hot news of company A', the cloud determines that a plurality of alternative technical domains of the voice demand information of the user are provided according to the first training model and comprise 'electronic products' and 'political affairs news', the association degree of the 'products A' and the 'electronic products' is determined to be 0.9 by the cloud, the association degree of the 'hot news of company A' and the 'products A' is determined to be 0.95 by the cloud through the first training model, the association degree of the 'hot news of company A' and the 'electronic products' is finally determined to be 0.95 to 0.855, the association degree of the 'hot news of company A' and the 'political affairs news' is determined in a manner similar to that of the 'electronic products', and the description is omitted here.

The cloud plays a message to the user through the smart speaker, "do you want to ask about the time news of company a? If the user feeds back negative voice response information, the cloud continuously plays the electronic product of company A through the intelligent sound box, and if the user feeds back positive voice response information, the cloud plays the product information of product A through the intelligent sound box to the user.

It should be noted that, the above-mentioned manner of calculating the association degree with respect to the first training model and the second training model is only an example, and any manner of scoring or determining the association degree is applicable to the present invention.

In another possible mode, the cloud stores the crawled messages in databases of corresponding skill domains according to the skill domains and stores the crawled messages in the front-most position of the ranking in the database of the corresponding skill domains when updating the sample information of the first training model in the public domain, the cloud can browse the messages firstly when searching the skill domains, and the first training model can improve the relevance between the updated skill domains and the related voice demand information.

For example: the technical domain of the cloud end comprises education news, financial news, military news and the like, one piece of information acquired by the cloud end through a crawler is 'nutritional lunch in middle and primary schools', the 'nutritional lunch in middle and primary schools' is stored in a database of the technical domain of the 'education news' by the cloud end, when the conversation sent by the user is collected by the cloud end through the intelligent sound box, the conversation is 'hot news today', the cloud end determines that the skill domain corresponding to the voice demand information of the user is education news, financial news and military news through the first training model, but only the education news is updated at this time, so that the association degree of the education news and the voice demand information of the user is the largest, after the fact that the technical domain consulted by the user is education news is determined, the cloud searches the latest information of the technical domain education news, and the content of the nutritional lunch of the primary and secondary schools is played to the cloud through the intelligent sound box.

It should be noted that the above is only a specific description of the first training model and the second training model, and does not deny the steps and methods before and after inputting information to the first training model and the second training model.

For the update time of the cloud public domain, the cloud may update the public domain information at regular time, for example, the public domain information is updated every morning.

2) A private domain.

A private domain is built for one user to use alone, thus providing the most effective control of data, security and quality of service.

And the cloud updates the information of the private domain aiming at the record of the personal account, and the cloud updates the sample data of the second training model according to the historical voice demand information in the private domain corresponding to the user identification and the corresponding skill domain.

The user identifier may be a device identifier of the smart device, or an identifier of another smart terminal networked with the smart device through another smart terminal, or a personal account created by the user through an APP that can be networked with the cloud.

Taking the smart sound box as an example, if the user identifier is the device identifier of the smart sound box, the smart sound box can upload the device identifier to the cloud through a TTS technology.

The following description is introduced to sample data of the second training model of the cloud-updated private domain:

for example, in combination with the above example, after the cloud plays the product information of the product a to the user through the smart speaker, when the cloud updates the sample data of the second training model in the private domain, it is determined that the "product a" is the skill domain label of the "electronic product" (i.e., the product a is added to the skill domain database of the electronic product), and it is assumed that the second training model determines that the association degree between the "product a" and the "electronic product" is 0.9. The cloud end uses the information as sample data of a second training model in the private domain.

It should be noted that, when the cloud updates the private domain information according to the historical voice demand information, the cloud updates only the voice demand information having the user expressed positive feedback information (positive feedback), and after determining that the user expressed negative feedback information (negative feedback), the cloud does not update the private domain information of the usage record.

And when the cloud determines whether the feedback of the user is positive feedback or negative feedback according to the voice response information of the user, the cloud determines according to the tone judgment of the intelligent sound box and the character information converted by the intelligent sound box.

After receiving the voice demand information of the user collected by the intelligent sound box, the cloud end determines the alternative skill domain and the association degree with the alternative skill domain of the voice demand information through the public domain and the private domain respectively. The specific cloud determines the association degree of the alternative skill domain and the public domain of the voice demand information through a first training model, and determines the association degree of the alternative skill domain and the private domain of the voice demand information through a second training model; and the cloud end determines the association degree of the alternative skill domain and the voice demand information according to the distribution weight of the association degree of the public domain and the association degree of the private domain.

When the cloud determines the association degree between the alternative skill domain and the voice demand information according to the association degree of the public domain and the distribution weight of the association degree of the private domain, the following two situations exist:

the first condition is as follows: there is no associated usage record within the private domain;

for example, in a scene that a user uses the smart sound box for the first time, if the user initiates a conversation "i want to hear the three kingdoms of speech" to the smart sound box, the cloud acquires the speech demand information of the user through the smart sound box, determines that the alternative skill domains of the speech demand information of the user are "drama" and "phase sound" through the first training model of the public domain, and obtains that the association degree of the public domain of the "drama" is 0.8, the association degree of the public domain of the "phase sound" is 0.7, and the cloud determines that no usage record of related words exists in the private domain, and then the cloud determines the association degree of the alternative skill domains and the speech demand information according to the alternative skill domains acquired by the public domain, that is, the association degree of the "drama" is 0.8, and the association degree of the "phase sound" is 0.7.

Case two: there is a relevant usage record within the private domain;

in combination with the above example, if the user often uses the smart sound box to play the triphase speech meaning, the cloud has sample data with the association degree of the triphase speech meaning and the triphase speech meaning being 0.9 in the private domain of the user ID (identification card), when the user initiates a dialog to the smart sound box again, "i want to listen to the triphase speech meaning", the cloud determines that the alternative skill domains of the voice demand information of the user are "drama" and "phase sound" through the first training model of the public domain, the association degree of the public domain of the "drama" is 0.8, and the association degree of the public domain of the "phase sound" is 0.7; the cloud determines that the alternative skill domain of the voice demand information of the user is 'vocal' through the second training module of the private domain, and the association degree of the 'vocal' private domain is 0.9, and then the cloud determines the association degree of the alternative skill domain and the voice demand information according to the distribution weight of the association degree of the public domain and the association degree of the private domain.

Assuming that the distribution weight is 6:4, the cloud determines that the association degree between the three kingdoms of listening to the "drama" is 0.8 × 60% — 0.48; the association degree between "i want to hear the three kingdoms" and "the phase" is 0.7 × 60% +0.9 × 40% + 0.78.

In the embodiment of the present application, the first training Model and the second training Model use Hidden Markov Models (HMMs), but the present application is not limited to the training models.

Optionally, this Application intelligence audio amplifier can also be networked with APP (Application), and APP can be networked with the high in the clouds.

The APP can push the hit conditions of all technical domains of the user in human-computer interaction within a period of time to the user, and the user can check and edit personal intention hit hot words.

Corresponding to the understanding of the hot words, the hot words have higher priority than ASR, and the intelligent sound box preferentially matches the hot words when recognizing the voice information.

For example, the hotword edited by the user through the APP is "Haigui", and if the user speaks a turtle, the intelligent sound box preferentially converts the text "Haigui" after acquiring that the voice information of the user includes "haiguii".

The intelligent sound box is provided with an online TTS (Text To Speech) processing module and is used for uploading a user mobile phone number identification and a Text needing To be broadcasted To the cloud and carrying out voice broadcasting on information returned by the cloud.

Based on the same inventive concept, the embodiment of the present invention further provides a network side device for controlling an intelligent device, and as the device is a device in the control system of the intelligent device in the embodiment of the present invention, and the principle of the device for solving the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 5, an embodiment of the present invention provides a network side device for controlling a smart speaker, including: processor 500 and transceiver 501:

the processor 500 is configured to: acquiring voice demand information of a user through intelligent equipment; the inquiry information of the technical domain corresponding to the voice demand information is played to the user through intelligent equipment; and according to the voice response information of the user acquired by the intelligent equipment, after determining the skill domain corresponding to the played inquiry information as the skill domain required by the user, executing the operation corresponding to the voice demand information through the intelligent equipment according to the determined skill domain.

Optionally, the processor 500 is further configured to:

after voice demand information of a user is collected through intelligent equipment, determining an alternative skill domain according to the voice demand information; if a candidate skill domain is determined, taking the candidate skill domain as a skill domain corresponding to the voice demand information; or if a plurality of candidate skill domains are determined, the network side device takes the candidate skill domain meeting the inquiry condition as the skill domain corresponding to the voice demand information.

Optionally, the processor 500 determines the candidate skill domain satisfying the query condition by:

Optionally, the processor 500 is further configured to:

Optionally, the processor 500 determines the association degree between the alternative skill domain and the voice requirement information by the following method:

Optionally, the processor 500 is further configured to:

In one possible implementation, the processor 500 is further configured to:

Optionally, the processor 500 is further configured to:

As shown in fig. 6, an embodiment of the present invention provides a network-side device for controlling a smart speaker, where the network-side device includes:

at least one processing unit 600 and at least one memory unit 601, wherein the memory unit 601 stores program code that, when executed by the processing unit 600, causes the processing unit 600 to perform the following:

acquiring voice demand information of a user through intelligent equipment; the inquiry information of the technical domain corresponding to the voice demand information is played to the user through intelligent equipment; and according to the voice response information of the user acquired by the intelligent equipment, after determining the skill domain corresponding to the played inquiry information as the skill domain required by the user, executing the operation corresponding to the voice demand information through the intelligent equipment according to the determined skill domain.

Optionally, the processing unit 600 is further configured to:

Optionally, the processing unit 600 determines the candidate skill domain satisfying the query condition by:

Optionally, the processing unit 600 is further configured to:

Optionally, the processing unit 600 determines the association degree between the alternative skill domain and the voice requirement information by the following method:

Optionally, the processing unit 600 is further configured to:

Optionally, the processor 500 is further configured to:

Based on the same inventive concept, the embodiment of the present invention further provides a method for controlling an intelligent device, and since the device corresponding to the method is a method corresponding to the device in the control system of the intelligent device in the embodiment of the present invention, and the principle of the method for solving the problem is similar to that of the device, the implementation of the method can refer to the implementation of the control system of the intelligent device, and repeated details are not repeated.

As shown in fig. 7, a method for controlling an intelligent device according to an embodiment of the present invention includes:

step 700, the network side equipment collects voice demand information of a user through intelligent equipment;

step 701, the network side device plays inquiry information of a technical domain corresponding to the voice demand information to a user through intelligent equipment;

step 702, the network side device determines, according to the voice response information of the user collected by the intelligent device, that the skill domain corresponding to the played inquiry information is the skill domain required by the user, and then executes, according to the determined skill domain, the operation corresponding to the voice demand information by the intelligent device.

Optionally, after the network side device collects the voice demand information of the user through the intelligent device, before the query information of the technical domain corresponding to the voice demand information is played to the user through the intelligent device, the method further includes:

Optionally, the network side device determines the alternative technology domain that satisfies the query condition by:

the network side equipment selects the first N skill domains from the multiple candidate skill domains according to the association degree of the voice demand information and the candidate skill domains;

wherein N is a positive integer greater than 1.

Optionally, if 1 of the association degrees of the first N candidate skill domains and the voice demand information is not less than the association degree of the first threshold, the network side device executes, by using an intelligent device, an operation corresponding to the voice demand information according to the skill domain corresponding to the association degree; or

Optionally, the network side device determines the association degree between the alternative skill domain and the voice requirement information through the following method:

Optionally, the network-side device updates sample data of the first training model according to information crawled from a network; and/or the network side equipment updates the sample data of the second training model according to the historical voice demand information and the corresponding skill domain.

Optionally, after the network-side device plays, to the user, inquiry information of a technical domain corresponding to the voice demand information through an intelligent device, the method further includes:

The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A control method of an intelligent device is characterized by comprising the following steps:

the network side equipment determines the skill domain corresponding to the played inquiry information as the skill domain required by the user according to the voice response information of the user acquired by the intelligent equipment, and then executes the operation corresponding to the voice demand information through the intelligent equipment according to the determined skill domain;

the network side equipment determines the association degree of an alternative skill domain and a public domain of voice demand information through a first training model, determines the association degree of the alternative skill domain and a private domain of the voice demand information through a second training model, and determines the association degree of the alternative skill domain and the voice demand information according to the distribution weight of the association degree of the public domain and the association degree of the private domain, wherein the alternative skill domain is determined by the network side equipment according to the voice demand information;

wherein, the sample data of the first training model is updated by the network side equipment according to the information crawled from the network; and/or the sample data of the second training model is updated by the network side equipment according to the historical voice demand information and the corresponding skill domain.

2. The method of claim 1, wherein after the network-side device collects voice demand information of a user through an intelligent device, and before query information of a technical domain corresponding to the voice demand information is played to the user through the intelligent device, the method further comprises:

3. The method of claim 2, wherein the network-side device determines the alternative technology domains that satisfy the query condition by:

wherein N is a positive integer greater than 1.

4. The method of claim 3, further comprising:

if 1 of the association degrees of the first N candidate skill domains and the voice demand information is not less than the association degree of the first threshold, the network side equipment executes the operation corresponding to the voice demand information through intelligent equipment according to the skill domain corresponding to the association degree; or

5. The method according to any one of claims 1 to 4, wherein after the network-side device plays the query information of the skill domain corresponding to the voice demand information to the user through the smart device, the method further includes:

6. A network side device for controlling an intelligent device, comprising: a processor and a transceiver:

the processor is used for executing the method of any one of claims 1 to 5.

7. A network side device for controlling an intelligent device, the device comprising: at least one processing unit and at least one memory unit, wherein the memory unit stores program code which, when executed by the processing unit, causes the processing unit to perform the steps of the method of any of claims 1 to 5.

8. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.