CN113593565B

CN113593565B - Intelligent home device management and control method and system

Info

Publication number: CN113593565B
Application number: CN202111147183.6A
Authority: CN
Inventors: 黄晓娜
Original assignee: Shenzhen Da Sheng Jia Technology Co ltd
Current assignee: Shenzhen Da Sheng Jia Technology Co ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2021-12-17
Anticipated expiration: 2041-09-29
Also published as: CN113593565A

Abstract

The invention provides an intelligent home equipment control method, which comprises the following steps: receiving user voice information; decomposing the voice information, extracting features through a neural network model, identifying the voice features of the user, and separating the user utterance; inputting the user utterance into a speech recognition model for speech recognition to generate text information; performing semantic analysis according to the generated text information, wherein the semantic information comprises intelligent equipment, operation and function states; identifying intelligent equipment, operation and function states according to the semantic information, sending a control instruction to the corresponding intelligent equipment, and enabling the corresponding intelligent equipment to execute corresponding operation and set to be in a corresponding function state; according to the method provided by the invention, the voice characteristics of the user are firstly identified in the mixed noise, the user words are separated, and then voice identification and semantic analysis are carried out, so that the intelligent equipment, operation and function states in the voice instruction of the user are accurately obtained, and therefore, the intelligent equipment can be accurately controlled, and the user experience is improved.

Description

Intelligent home device management and control method and system

Technical Field

The invention belongs to the field of intelligent home equipment, and particularly relates to a method and a system for managing and controlling intelligent home equipment.

Background

With the development of computer network technology, the application of artificial intelligence is becoming more and more extensive, such as manufacturing, entertainment, trade, etc., and various industries are dedicated to the research of artificial intelligence, especially for smart homes.

At present, a plurality of intelligent household products can meet the daily life demands of people. The user can realize the control and use of the smart home through a user interface running on terminal equipment (such as a handheld terminal and a smart phone). However, different smart homes often have different terminal devices or user interfaces, so that a user often needs to use a plurality of terminal devices or install a plurality of clients, and for the user, the operation is complex, the intelligence degree is not enough, and the user experience is not good; the mode of controlling the intelligent equipment through voice also begins to appear later, and the control is more and more convenient, but the problem of inaccurate voice control exists, and the user is also puzzled.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides an intelligent home device control method.

The invention adopts the following technical scheme:

an intelligent home device management and control method comprises the following steps:

receiving user voice information;

decomposing the voice information, extracting the characteristics of the decomposed voice information through a neural network model, identifying the voice characteristics of the user, and separating the user utterance;

inputting the user utterance into a speech recognition model for speech recognition to generate corresponding text information; the voice recognition model comprises a sampling layer and a voice recognition layer, and the sampling layer performs convolution and down-sampling processing to obtain a local information characteristic vector; inputting the obtained local information characteristic vector into a voice recognition layer of a voice recognition model for processing, and obtaining text information corresponding to the user utterance;

performing semantic analysis according to the generated text information to acquire an entity in the text information to be analyzed; acquiring a structured entity vector corresponding to the entity according to the entity, wherein the structured entity vector is used for indicating the identification of the entity and the attribute of the entity; carrying out feature extraction on the structured entity vector to obtain entity features; fusing the entity features, the lexical features of the text and the syntactic features of the text to obtain semantic features of the text, wherein the semantic features are used for acquiring semantic information of the text, and the semantic information comprises intelligent equipment, operation and function states;

and identifying the intelligent equipment, the operation and the function state according to the semantic information, sending a control instruction to the corresponding intelligent equipment, and enabling the corresponding intelligent equipment to execute the corresponding operation and set the corresponding operation and function state.

Specifically, the method further comprises the following steps:

receiving feedback information sent by the intelligent equipment according to the control instruction;

and when the feedback information shows that the control command operation is successful, replying the information and finishing the operation.

Specifically, all the steps are preceded by:

and inputting the voice information of the user, and identifying the voice characteristics of the user according to the input voice information.

Specifically, decomposing the voice information, performing feature extraction on the decomposed voice information through a neural network model, and separating out the user utterance specifically includes:

firstly, decomposing voice information according to different frequency bands;

extracting the characteristics of the decomposed voice information through a neural network model, comparing the user voice characteristics input in advance to determine the user voice characteristics, and identifying the user voice characteristics;

and separating the user words according to the voice frequency band corresponding to the voice characteristics of the user.

Another embodiment of the present invention provides an intelligent home device management and control system, including:

a voice input unit: receiving user voice information;

separating a voice unit: decomposing the voice information, extracting the characteristics of the decomposed voice information through a neural network model, identifying the voice characteristics of the user, and separating the user utterance;

a voice recognition unit: inputting the user utterance into a speech recognition model for speech recognition to generate corresponding text information; the voice recognition model comprises a sampling layer and a voice recognition layer, and the sampling layer performs convolution and down-sampling processing to obtain a local information characteristic vector; inputting the obtained local information characteristic vector into a voice recognition layer of a voice recognition model for processing, and obtaining text information corresponding to the user utterance;

a semantic analysis unit: performing semantic analysis according to the generated text information to acquire an entity in the text information to be analyzed; acquiring a structured entity vector corresponding to the entity according to the entity, wherein the structured entity vector is used for indicating the identification of the entity and the attribute of the entity; carrying out feature extraction on the structured entity vector to obtain entity features; fusing the entity features, the lexical features of the text and the syntactic features of the text to obtain semantic features of the text, wherein the semantic features are used for acquiring semantic information of the text, and the semantic information comprises intelligent equipment, operation and function states;

an instruction operation unit: and identifying the intelligent equipment, the operation and the function state according to the semantic information, sending a control instruction to the corresponding intelligent equipment, and enabling the corresponding intelligent equipment to execute the corresponding operation and set the corresponding operation and function state.

Specifically, the method further comprises the following steps:

a feedback unit: receiving feedback information sent by the intelligent equipment according to the control instruction;

Specifically, the method further comprises the following steps:

a presetting unit: and inputting the voice information of the user, and identifying the voice characteristics of the user according to the input voice information.

Specifically, in the voice separation unit, the voice information is decomposed, and the voice information after decomposition is subjected to feature extraction through a neural network model, so as to separate the user utterance, specifically including:

firstly, decomposing voice information according to different frequency bands;

Yet another embodiment of the present invention provides an electronic device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the intelligent home device management and control method as described above when executing the computer program.

Yet another embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements a smart home device management and control method as described above.

As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:

(1) the invention provides an intelligent home equipment control method, which comprises the steps of firstly receiving user voice information; decomposing the voice information, extracting the characteristics of the decomposed voice information through a neural network model, identifying the voice characteristics of the user, and separating the user utterance; inputting the user utterance into a speech recognition model for speech recognition to generate corresponding text information; performing semantic analysis according to the generated text information, wherein the semantic information comprises intelligent equipment, operation and function states; identifying intelligent equipment, operation and function states according to the semantic information, sending a control instruction to the corresponding intelligent equipment, and enabling the corresponding intelligent equipment to execute corresponding operation and set to be in a corresponding function state; according to the method provided by the invention, the voice characteristics of the user are firstly identified in the mixed noise, the user words are separated, and then voice identification and semantic analysis are carried out, so that the intelligent equipment, operation and function states in the voice instruction of the user are accurately obtained, and therefore, the intelligent equipment can be accurately controlled, and the user experience is improved.

(2) According to the method provided by the invention, voice information is decomposed, and the decomposed voice information is subjected to feature extraction through a neural network model, so that the voice features of a user are identified, and the user words are separated; interference information is eliminated, intelligent equipment control words of a user are separated, and a foundation is provided for realizing accurate control of the intelligent equipment.

(3) According to the method provided by the invention, semantic analysis is carried out according to the generated text information, the invention realizes the fusion of heterogeneous information by fusing entity characteristics, lexical characteristics and syntactic characteristics, and the semantic information of three different vector spaces, namely the entity characteristics, the lexical characteristics and the syntactic characteristics, is combined together to identify semantics, so that the semantic understanding accuracy is improved, and the control accuracy of intelligent equipment is further improved.

Drawings

Fig. 1 is a flowchart of a method for managing and controlling smart home devices according to an embodiment of the present invention;

fig. 2 is a structural diagram of an intelligent home device management and control system according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an embodiment of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

The invention is further described below by means of specific embodiments.

The invention provides an intelligent home device control method, which comprises the steps of firstly identifying the voice characteristics of a user in mixed noise, separating the words of the user, then carrying out voice identification and semantic analysis, and accurately obtaining the intelligent device, operation and function states in a voice instruction of the user, thereby realizing accurate control of the intelligent device and improving user experience.

Fig. 1 is a flowchart of a method for managing and controlling smart home devices according to an embodiment of the present invention; the method specifically comprises the following steps:

s101: receiving voice information of a user;

when a user needs to use the intelligent equipment, only a voice instruction needs to be sent, such as 'turn on the air conditioner, set the temperature to be 26 degrees', 'turn on the television, set the channel as a central set', and the like; the method provided by the invention comprises the steps of firstly receiving voice information of a user;

before the step, the method also comprises the steps of inputting user voice information and identifying the voice characteristics of the user according to the input voice information;

since the method is to analyze according to the voice characteristics of the user, the voice information of the user needs to be input in advance, the voice characteristics are recognized according to the input voice information, and certainly, the voice information of a plurality of users can be input, so that other users can also realize the control of the intelligent home device.

S102: decomposing the voice information, extracting the characteristics of the decomposed voice information through a neural network model, identifying the voice characteristics of the user, and separating the user utterance;

decomposing the voice information, and performing feature extraction on the decomposed voice information through a neural network model to separate user utterances, which specifically comprises the following steps:

firstly, decomposing voice information according to different frequency bands;

the neural network model is a CNN network or a ResNet network, the voice characteristics of the user are voiceprint characteristics, and the voiceprint characteristics are unique, the embodiment of the invention applies the CNN network, and the specific extraction process comprises the following steps:

the CNN network comprises a convolution layer, a pooling layer and a full-connection layer, and the voice matrix is input into the convolution layer to perform local feature extraction to obtain a feature vector C ═ C₁ , c₂ ,…,c_m ]Where cj is a filter derived feature vector, c_j＝[c₁ ,c₂ ,...,c_n-h+1]M is the number of convolution kernels, h is the size of the convolution kernel, ci is the local feature vector, and c_iF (ω × xi: i + h-1 + b), wherein c_iIs the local feature extracted by the convolution operation; f represents a non-linear function; ω is a filter of size h × d; h is the size of the convolution kernel, representing h words; x is the number of_i:i+h-1Is a vector formed by h words from i to i + h-1; b is a bias term; the pooling layer is used for matching the feature vector c by adopting a Max scaling technology_jAnd performing down-sampling to obtain an optimal solution Mj of the local value to obtain a feature vector M after down-sampling, and inputting the feature vector M into the full-connection layer to obtain a vector U.

When a user sends voice information to control the intelligent equipment, useful voice information usually contains background sounds, including the sound of a pet at home, the sound sent by a collision object and the like, so that the voice control is out of order or inaccurate;

therefore, firstly, the voice information is decomposed, and the decomposition method in the embodiment of the invention mainly decomposes the voice information into a plurality of frequency bands according to the frequency of sound; the more frequency bands, the higher the control accuracy, but the more complex the algorithm and the serious delay, considering the balance of the control accuracy and the delay, the embodiment of the invention is divided into 4 frequency bands, 300HZ, 300Hz-1KHz, 1KHz-3.4KHz and 3.4 KHz.

S103: inputting the user utterance into a speech recognition model for speech recognition to generate corresponding text information; the voice recognition model comprises a sampling layer and a voice recognition layer, and the sampling layer performs convolution and down-sampling processing to obtain a local information characteristic vector; inputting the obtained local information characteristic vector into a voice recognition layer of a voice recognition model for processing, and obtaining text information corresponding to the user utterance;

the sampling layer comprises a convolution layer and a pooling layer, convolution processing is carried out on the user words through the convolution layer in the sampling layer, convolution results are input into the pooling layer in the sampling layer, and downsampling operation is carried out on the convolution results through a maximum pooling method to obtain local information characteristic vectors. The maximum pooling method comprises the following steps: taking a point with the maximum value in a local acceptance domain; the specific operations of the convolution and downsampling are the contents of the prior art, and are not described in detail.

The embodiment of the invention adopts a voice recognition model long short term memory network (LSTM) and CTC (connecting temporal classification) algorithm, and the main process of voice recognition is as follows: (1) extracting acoustic features from the sound waveform; (2) converting the acoustic features into factors of pronunciation; (3) converting into readable text information by using decoding technology; the decoding process computes acoustic and language model scores for a given phoneme sequence and several hypothesized word sequences, taking the sequence with the highest overall output score as the recognition result.

S104: performing semantic analysis according to the generated text information to acquire an entity in the text information to be analyzed; acquiring a structured entity vector corresponding to the entity according to the entity, wherein the structured entity vector is used for indicating the identification of the entity and the attribute of the entity; carrying out feature extraction on the structured entity vector to obtain entity features; fusing the entity features, the lexical features of the text and the syntactic features of the text to obtain semantic features of the text, wherein the semantic features are used for acquiring semantic information of the text, and the semantic information comprises intelligent equipment, operation and function states;

the invention realizes the fusion of heterogeneous information by fusing the entity characteristics, the lexical characteristics and the syntactic characteristics, combines the semantic information of three different vector spaces of the entity characteristics, the lexical characteristics and the syntactic characteristics together to identify the semantics, thereby improving the accuracy of semantic understanding;

the semantic information includes the intelligent device, the operation and the function state, such as a voice command in "turn on the air conditioner, set the temperature to 26 degrees", the "air conditioner" is the intelligent device, the "turn on" is the operation, the "temperature to 26 degrees" is the function state, and further such as "turn on the television, the channel is the central set", "the television" is the intelligent device, the "turn on" is the operation, and the "channel is the central set".

S105: and identifying the intelligent equipment, the operation and the function state according to the semantic information, sending a control instruction to the corresponding intelligent equipment, and enabling the corresponding intelligent equipment to execute the corresponding operation and set the corresponding operation and function state.

After receiving the control instruction, the intelligent device executes corresponding actions and feeds back information, such as 'completed' or 'air conditioner opened, temperature set to 26 degrees' and the like;

Reply messages, e.g. "receive, thank you"!

As shown in fig. 2, another embodiment of the present invention provides an intelligent home device management and control system, including:

the voice input unit 201: receiving voice information of a user;

in addition, the voice recognition system also comprises a preset unit, a voice recognition unit and a voice recognition unit, wherein the preset unit is used for inputting the voice information of the user and recognizing the voice characteristics of the user according to the input voice information;

Separate speech unit 202: decomposing the voice information, extracting the characteristics of the decomposed voice information through a neural network model, identifying the voice characteristics of the user, and separating the user utterance;

firstly, decomposing voice information according to different frequency bands;

The voice recognition unit 203: inputting the user utterance into a speech recognition model for speech recognition to generate corresponding text information; the voice recognition model comprises a sampling layer and a voice recognition layer, and the sampling layer performs convolution and down-sampling processing to obtain a local information characteristic vector; inputting the obtained local information characteristic vector into a voice recognition layer of a voice recognition model for processing, and obtaining text information corresponding to the user utterance;

The semantic analysis unit 204: performing semantic analysis according to the generated text information to acquire an entity in the text information to be analyzed; acquiring a structured entity vector corresponding to the entity according to the entity, wherein the structured entity vector is used for indicating the identification of the entity and the attribute of the entity; carrying out feature extraction on the structured entity vector to obtain entity features; fusing the entity features, the lexical features of the text and the syntactic features of the text to obtain semantic features of the text, wherein the semantic features are used for acquiring semantic information of the text, and the semantic information comprises intelligent equipment, operation and function states;

The instruction operation unit 205: and identifying the intelligent equipment, the operation and the function state according to the semantic information, sending a control instruction to the corresponding intelligent equipment, and enabling the corresponding intelligent equipment to execute the corresponding operation and set the corresponding operation and function state.

Reply messages, e.g. "receive, thank you"!

As shown in fig. 3, an embodiment of the present invention provides an electronic device 300, which includes a memory 310, a processor 320, and a computer program 311 stored in the memory 320 and running on the processor 320, where when the processor 320 executes the computer program 311, the intelligent home device management and control method provided by the embodiment of the present invention is implemented.

In a specific implementation, when the processor 320 executes the computer program 311, any of the embodiments corresponding to fig. 1 may be implemented.

Since the electronic device described in this embodiment is a device used for implementing a data processing apparatus in the embodiment of the present invention, based on the method described in this embodiment of the present invention, a person skilled in the art can understand the specific implementation manner of the electronic device in this embodiment and various variations thereof, so that how to implement the method in this embodiment of the present invention by the electronic device is not described in detail herein, and as long as the person skilled in the art implements the device used for implementing the method in this embodiment of the present invention, the device used for implementing the method in this embodiment of the present invention belongs to the protection scope of the present invention.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating an embodiment of a computer-readable storage medium according to the present invention.

As shown in fig. 4, the present embodiment provides a computer-readable storage medium 400, on which a computer program 411 is stored, the computer program 411 implementing an intelligent home device management and control method provided by the present embodiment when executed by a processor;

in a specific implementation, the computer program 411 may implement any of the embodiments corresponding to fig. 1 when executed by a processor.

It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The invention provides an intelligent home equipment control method, which comprises the steps of firstly receiving user voice information; decomposing the voice information, extracting the characteristics of the decomposed voice information through a neural network model, identifying the voice characteristics of the user, and separating the user utterance; inputting the user utterance into a speech recognition model for speech recognition to generate corresponding text information; performing semantic analysis according to the generated text information, wherein the semantic information comprises intelligent equipment, operation and function states; identifying intelligent equipment, operation and function states according to the semantic information, sending a control instruction to the corresponding intelligent equipment, and enabling the corresponding intelligent equipment to execute corresponding operation and set to be in a corresponding function state; according to the method provided by the invention, the voice characteristics of the user are firstly identified in the mixed noise, the user words are separated, and then voice identification and semantic analysis are carried out, so that the intelligent equipment, operation and function states in the voice instruction of the user are accurately obtained, and therefore, the intelligent equipment can be accurately controlled, and the user experience is improved.

According to the method provided by the invention, voice information is decomposed, and the decomposed voice information is subjected to feature extraction through a neural network model, so that the voice features of a user are identified, and the user words are separated; interference information is eliminated, intelligent equipment control words of a user are separated, and a foundation is provided for realizing accurate control of the intelligent equipment.

According to the method provided by the invention, semantic analysis is carried out according to the generated text information, the invention realizes the fusion of heterogeneous information by fusing entity characteristics, lexical characteristics and syntactic characteristics, and the semantic information of three different vector spaces, namely the entity characteristics, the lexical characteristics and the syntactic characteristics, is combined together to identify semantics, so that the semantic understanding accuracy is improved, and the control accuracy of intelligent equipment is further improved.

The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.

Claims

1. An intelligent home device management and control method is characterized by comprising the following steps:

receiving user voice information;

firstly, decomposing voice information according to different frequency bands;

separating user words according to a voice frequency band corresponding to the voice characteristics of the user;

2. The intelligent home device management and control method according to claim 1, further comprising:

3. The intelligent home device management and control method according to claim 1, wherein all the steps are preceded by:

4. The utility model provides an intelligent household equipment management and control system which characterized in that includes:

a voice input unit: receiving user voice information;

in the separation speech unit, carry out the decomposition to speech information to carry out feature extraction to speech information after decomposing through neural network model, separate out the user's utterance, specifically include:

firstly, decomposing voice information according to different frequency bands;

5. The intelligent home device management and control system according to claim 4, further comprising:

6. The intelligent home device management and control system according to claim 4, further comprising:

7. An electronic device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a smart home device management and control method according to any one of claims 1 to 3 when executing the computer program.

8. A computer-readable storage medium on which a computer program is stored, the program implementing a smart home device management method according to any one of claims 1 to 3 when executed by a processor.