CN109377995B - Method and device for controlling equipment - Google Patents
Method and device for controlling equipment Download PDFInfo
- Publication number
- CN109377995B CN109377995B CN201811381967.3A CN201811381967A CN109377995B CN 109377995 B CN109377995 B CN 109377995B CN 201811381967 A CN201811381967 A CN 201811381967A CN 109377995 B CN109377995 B CN 109377995B
- Authority
- CN
- China
- Prior art keywords
- control command
- voice control
- mouth shape
- voice
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000006243 chemical reaction Methods 0.000 claims description 36
- 238000012545 processing Methods 0.000 claims description 29
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 18
- 230000009466 transformation Effects 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000001914 filtration Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000007791 dehumidification Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000002618 waking effect Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/2803—Home automation networks
- H04L12/2816—Controlling appliance services of a home automation network by calling their functionalities
- H04L12/282—Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Automation & Control Theory (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Selective Calling Equipment (AREA)
Abstract
The invention discloses a method and a device for controlling equipment, which are used for solving the problem that the accuracy of a voice control command obtained by analysis is not high when intelligent household equipment is controlled in the prior art. The method comprises the steps of firstly matching a determined voice control command with lip language mouth shape information of a voice control command sent by a user and aiming at the intelligent household equipment, determining a first voice control command according to a matching result, and controlling the intelligent household equipment according to the first voice control command. The voice control command is matched with the lip language mouth shape when the user sends the voice control command aiming at the intelligent household equipment, and the voice control command for controlling the intelligent household equipment is determined according to the matching result, so that the accuracy of extracting the voice control command can be improved.
Description
Technical Field
The present invention relates to the field of wireless communications technologies, and in particular, to a method and an apparatus for controlling a device.
Background
The intelligent home is an ecosystem which takes a house as a platform and connects various devices in the house together through the Internet of things technology to realize intellectualization. The intelligent video interphone has the functions of intelligent light control, intelligent electric appliance control, a security monitoring system, intelligent background music, intelligent video sharing, a visual intercom system, a home theater system and the like.
The intelligent home integrates facilities related to home life by utilizing a comprehensive wiring technology, a network communication technology, a safety precaution technology, an automatic control technology and an audio and video technology, constructs an efficient management system for residential facilities and family schedule affairs, improves home safety, convenience, comfortableness and artistry, and realizes an environment-friendly and energy-saving living environment.
In the existing intelligent home environment, when a target user controls equipment in a functional furniture system through voice, an intelligent home system server directly analyzes and extracts a control voice command of the user from collected voice information containing the control voice command, and controls corresponding intelligent home equipment according to the determined control voice command. However, in daily home life, a home is usually configured with a plurality of smart home devices, and a home usually has a plurality of users, when a user controls a smart home device, other users may be controlling other smart home devices, and other users may also be talking, and at this time, voice information collected by the server is very complex, and if the voice command is directly extracted, the accuracy of analyzing and recognizing the user control voice command is not high due to noise interference.
In summary, when the smart home device is controlled, the accuracy of the user control voice command obtained through analysis is not high.
Disclosure of Invention
The invention provides a method and a device for controlling equipment, which are used for solving the problem that the accuracy of a voice control command obtained by analysis is not high when intelligent household equipment is controlled in the prior art.
In a first aspect, an embodiment of the present invention provides a method for controlling a device, where the method includes:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
and controlling the intelligent household equipment according to the first voice control command.
According to the method, firstly, the determined voice control command is matched with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, the first voice control command is determined according to the matching result, and the intelligent household equipment is controlled according to the first voice control command. The voice control command is matched with the lip language mouth shape when the user sends the voice control command aiming at the intelligent household equipment, and the voice control command for controlling the intelligent household equipment is determined according to the matching result, so that the accuracy of extracting the voice control command can be improved.
In one possible implementation, the matching result is determined by:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
The method provides a method for determining the existence of noise in the voice control command, and combines the lip language mouth shape information and the matching degree of the voice control command, so as to judge whether the noise exists in the voice control command more accurately.
In a possible implementation manner, the determining the first voice control command according to the matching result includes:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
The method determines the first voice control command according to the matching result, if no noise exists in the determined voice control command, the first voice control command is determined according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information, and if no noise exists in the determined voice control command, the determined voice control command is used as the first voice control command, so that the accuracy of analyzing the voice control command can be improved.
In a possible implementation manner, the determining a first voice control command according to the voice word number information and the lip language mouth shape information corresponding to the determined voice control command includes:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control instruction aiming at the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
The method comprises the steps of comparing voice digital information corresponding to the voice control command with the mouth shape conversion times corresponding to the lip language mouth shape of the intelligent household equipment by a user, discarding voice control commands of the lip language mouth shape which are not matched in the voice control command, and replacing the voice control command which cannot be matched with the lip language mouth shape information according to a replacement principle, so that noise in the voice control command is filtered by combining the lip language mouth shape information, and the accuracy of analyzing the voice control command is improved.
In a possible implementation manner, the determining a first voice control command according to the voice word number information and the lip language mouth shape information corresponding to the determined voice control command includes:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
According to the method, another method for filtering noise in the voice control command is provided, if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, the voice control command which cannot be matched with the lip language mouth shape information is replaced according to a replacement principle, so that the noise in the voice control command is filtered by combining the lip language mouth shape information, and the accuracy of analyzing the voice control command is improved.
In one possible implementation, the voice word number information corresponding to the determined voice control command is determined by:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
According to the method, the acquired voice information is analyzed according to the voice recognition model, and the voice recognition model is obtained through neural network training according to the voice information, the voice control command and the voice word number information, so that the voice control command and the voice word number information corresponding to the voice information can be analyzed.
In a possible implementation manner, the mouth shape transformation time information and the alternative word information corresponding to the lip language mouth shape information are determined by the following method:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
According to the method, the obtained lip language mouth shape information is analyzed according to the image recognition model, and the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information, so that the mouth shape transformation frequency and the replacement word information corresponding to the lip language mouth shape information can be obtained when the lip language mouth shape information is analyzed.
In a second aspect, an embodiment of the present invention provides an apparatus for controlling a device, where the apparatus includes: at least one processing unit and at least one memory unit, wherein the memory unit stores program code that, when executed by the processing unit, causes the processing unit to perform the following:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
and controlling the intelligent household equipment according to the first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
In a possible implementation manner, the processing unit is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control instruction aiming at the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
In a possible implementation manner, the processing unit is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
In a third aspect, an embodiment of the present invention provides an apparatus for controlling a device, where the apparatus includes:
a determination module: matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
a control module: and the intelligent household equipment is controlled according to the first voice control command.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method in the first aspect.
In addition, for technical effects brought by any one implementation manner of the second aspect to the fourth aspect, reference may be made to technical effects brought by different implementation manners of the first aspect, and details are not described here.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of a method for controlling a device according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a first control device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a second control device according to an embodiment of the present invention;
fig. 4 is a flowchart of a complete method for controlling a device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Along with the popularity of intelligent equipment, smart homes gradually enter the lives of people, when a user wants to control the smart homes by voice, the user needs to analyze collected voice information to obtain a voice control command, and then controls the smart homes according to the voice control command. When the collected voice information is analyzed, if only the voice information of the user aiming at the intelligent home equipment needing to be controlled exists in the voice information, the intelligent home equipment can be controlled according to the voice control command analyzed from the voice information, if other voice information exists in the voice information except the voice information of the user aiming at the intelligent home equipment needing to be controlled, for example, the voice information of other users aiming at other intelligent home equipment and the voice information of conversations among other users, when the collected voice information is analyzed, the analyzed voice control command is possibly inaccurate, and therefore the intelligent home equipment can be controlled wrongly through the voice control command.
For example, when the user a wants to open the air conditioner 1, the user a sends out voice information of "open the air conditioner", and at this time, the user B wants to control the air conditioner 2 to dehumidify, and the user B sends out voice information of "dehumidify", and when the voice information is collected, both the voices of "open the air conditioner" and "dehumidify" are collected, and then for the air conditioner 1, when the voice information is analyzed according to the collected voice information, the analyzed voice control command is "open the air conditioner, dehumidify", and at this time, the analyzed voice control command is an erroneous voice control command, which can also be understood as noise existing in the analyzed voice control command.
If noise exists in the voice control command analyzed according to the collected voice information, the noise in the voice control command needs to be filtered, and then the intelligent household equipment is controlled according to the filtered voice control command.
The execution subject in the embodiment of the present invention may be a server;
according to the embodiment of the invention, the voice information can be acquired through the microphone, and the lip language mouth shape information can be acquired through the camera.
The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems with the occurrence of a new service scenario.
In view of the foregoing application scenarios, an embodiment of the present invention provides a method for controlling a device, and as shown in fig. 1, the method specifically includes the following steps:
In the embodiment of the invention, the determined voice control command is matched with lip language mouth shape information of the voice control command of the intelligent household equipment sent by a user, a first voice control command is determined according to a matching result, and the intelligent household equipment is controlled according to the first voice control command. The determined voice control command is matched with the lip language mouth shape information when the user sends the voice control command aiming at the intelligent household equipment, and the voice control command for controlling the intelligent household equipment is determined according to the matching result, so that the accuracy of extracting the voice control command can be improved.
In implementation, before determining the voice control command, if the microphone and the camera of the smart home device are in a sleep state, the user needs to wake up the microphone and the camera.
The method for waking up the microphone and the camera of the smart home device may be performed by using a wake-up word, for example, when a user sends a wake-up word of "air conditioner 1", the microphone and the camera connected to the air conditioner 1 are woken up; remote control wake-up may also be used, such as waking up the microphone and camera using a remote control.
After the microphone and the camera are awakened, the microphone can collect voice information, and the camera collects lip language mouth shape information of a user aiming at the intelligent household equipment.
It should be noted that, when the microphone collects the voice information, the collected voice information is the voice information that the microphone can recognize, for example, the user a sends out the voice information of "turning on the air conditioner 1", the user B sends out the voice information of "dehumidifying", and if the voice information microphones sent out by the user a and the user B can be collected, the collected voice information of the microphone at this time is "turning on the air conditioner" or "dehumidifying";
when the camera gathers lip language mouth shape information, what the collection was is the lip language mouth shape information of the intelligent household equipment that the user is directed against needs to control, for example, user A will control intelligent air conditioner 1, then user A need stand in the camera visual range who is connected with intelligent air conditioner 1, then sends speech information.
The following describes the speech information analysis and lip language mouth shape information analysis, respectively.
The microphone sends the voice information to the server after collecting the voice information, and the server analyzes the voice information after receiving the voice information.
Specifically, when analyzing the voice information, the obtained voice information may be analyzed according to a voice recognition model, where the voice recognition model is obtained by training through a neural network according to the voice information, the voice control command, and the voice word number information.
It should be noted that, the construction of the voice recognition model requires a large amount of voice information, voice control commands and voice word number information, the voice recognition model is obtained after the training of the neural network, and the voice control commands and the voice word number information corresponding to the voice information are obtained after the voice information is input into the voice recognition model.
And after analyzing the acquired voice information, the server obtains a voice control command and voice word number information corresponding to the voice information.
For example, the microphone transmits the collected voice information of "turning on the air conditioner 1" to the server, and the server analyzes the voice information after receiving the voice information, and obtains that the control command corresponding to the voice information is "on" and the number of words of voice information corresponding to the voice information is 2.
When the server analyzes the acquired voice information, there is a possibility that the analysis fails, for example, the voice information collected by a microphone connected to the air conditioner is "turn on a television", and when the server analyzes the voice information, the analysis fails, and the server may push a message that the voice analysis fails to the user.
The above is the analysis of the voice information by the server, and the following is the analysis of the lip language mouth shape information by the server.
After the lip language mouth shape information is collected by the camera, the lip language mouth shape information is sent to the server, and after the lip language mouth shape information is received by the server, the lip language mouth shape information is analyzed.
Specifically, when analyzing the lip language mouth shape information, the obtained lip language mouth shape information may be analyzed according to an image recognition model, where the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information, and the replacement word information.
It should be noted that, the construction of the image recognition model requires a large amount of lip language mouth shape information, mouth shape transformation frequency information, and replacement word information, the image recognition model is obtained after training of the neural network, and after the lip language mouth shape information is input into the image recognition model, the mouth shape transformation frequency information corresponding to the lip language mouth shape information and the replacement word information corresponding to the lip language mouth shape are obtained.
And after analyzing the obtained lip language mouth shape information, the server obtains mouth shape conversion frequency information and alternative word information corresponding to the lip language mouth shape information.
After the server acquires the voice information, firstly, the voice information is analyzed into a voice control command, if the analysis is successful, the voice control command acquired through the analysis is matched with the lip language mouth shape information acquired when a user sends the voice control command for the intelligent home equipment, a first voice control command is determined according to a matching result, and the intelligent home equipment is controlled according to the first voice control command.
In the implementation, the matching results are two, and if the matching degree of the first determined voice control command and the lip language mouth shape information when the user sends the voice control instruction aiming at the intelligent household equipment is less than a threshold value, the determined voice control command is determined to have noise;
and if the second matching degree is not less than the threshold value, determining that no noise exists in the voice control command.
For example, when the voice control command information analyzed by the server is ' turn on the air conditioner 1 ', and the voice control command for turning on the air conditioner 1 ' is matched with the obtained lip language mouth shape information of the user for the smart home device, the matching degree is 90%, and if the threshold value is 80%, it is determined that no noise exists in the voice control command;
for another example, when the voice control command information analyzed by the server is "turn on the air conditioner 1", and the voice control command for turning on the air conditioner 1 "is matched with the obtained lip language mouth shape information of the smart home device by the user, the matching degree is 70%, and if the threshold value is 80%, it is determined that noise exists in the voice control command.
If no noise exists in the determined voice control command, taking the voice control command as a first voice control command, namely controlling the intelligent household equipment according to the voice control command;
if noise exists in the determined voice control command, the voice control command with the noise filtered is used for controlling the intelligent home equipment, the noise is filtered according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information, and the voice control command with the noise filtered is used as a first voice control command.
In the embodiment of the present invention, two cases are used when noise is filtered for a voice control command with noise, where in the first case, the voice word number information corresponding to the voice control command is greater than the mouth shape conversion frequency information corresponding to the lip language mouth shape, and in the second case, the voice word number information corresponding to the voice control command is not greater than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the following respectively describes the two cases of filtering noise in the voice control command.
In case one, the voice word number information corresponding to the voice control command is larger than the mouth shape conversion times information corresponding to the lip language mouth shape information.
And if the voice word number information corresponding to the voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the voice control command.
For example, if the voice control command is "turn on the air conditioner a to dehumidify", the number of words of voice information corresponding to the voice control command is 7, if the number of times of mouth shape conversion corresponding to the lip language mouth shape information is 5, the number of words of voice information is greater than the voice mouth shape information, at this time, the voice control command needs to be matched with the lip language mouth shape information, the voice control command which is not matched with the lip language mouth shape information in the voice control command is discarded, if the voice control command which is not matched with the lip language mouth shape information is "dehumidify", the voice control command "dehumidify" is discarded, the voice control command which is discarded after the "dehumidify" is "turn on the air conditioner a", and the voice control command "turn on the air conditioner a" is the voice control command after the noise is filtered.
If the matching degree is not less than the threshold value, the second voice control command is used as a first voice control command, and the intelligent household equipment is controlled according to the first voice control command;
and if the matching degree of the second voice control information and the lip language mouth shape information is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information which cannot be matched with the control command according to a replacement principle to obtain a first voice control command, and controlling the intelligent home equipment according to the first voice control command.
And in the second situation, the voice word number information corresponding to the voice control command is not more than the mouth shape conversion frequency information corresponding to the lip language mouth shape information.
And if the voice word number information corresponding to the voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle.
It should be noted that the replacement principle is to determine whether the meaning of the voice control command to be replaced is similar to that of the replacement word, if so, the replacement is performed, and if not, the voice parsing fails.
The following illustrates how to filter noise in the voice control command when the voice word number information corresponding to the voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information.
For example, if the voice control command is "turn on the air conditioner a to increase", the voice word number information corresponding to the voice control command is 7, if the mouth shape conversion frequency corresponding to the lip language mouth shape information is 7, the voice word number information is equal to the voice mouth shape information, at this time, the voice control command needs to be matched with the lip language mouth shape information, if the voice control command "increase" is not matched with the lip language mouth shape information, correspondingly, a part of the lip language mouth shape information is not matched with the voice control command, a replacement word corresponding to the lip language mouth shape information which is not matched with the voice control command is replaced according to a replacement principle, for example, the replacement word is "dehumidification", and if the server determines that the meanings of "increase" and "dehumidification" are not similar, the voice control command is selected not to be replaced, and the voice information analysis fails;
the replacement word corresponding to the lip language mouth shape information which is not matched with the voice control instruction is 'heightening', the server judges that the meanings of 'heightening' and 'heightening' are similar, the 'heightening' is selected to be replaced by the 'heightening', the voice control instruction after replacement is 'air conditioner A heightening' opened, and namely the voice control instruction after noise filtering is 'air conditioner A heightening opened'.
The replacement word information is obtained according to a large number of experimental results and is stored in the server in advance.
It should be noted that the server determines whether the voice control command and the replacement word have similar meanings, and may determine according to some data stored in the server, for example, if the similarity of the meanings of "increase" and "increase" is 90%, the replacement may be performed.
It should be noted that, when comparing the voice word number information corresponding to the voice control command with the mouth shape conversion frequency information corresponding to the lip language mouth shape information, if the voice word number information corresponding to the voice control command is smaller than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, at this time, the voice word number information may be replaced according to a replacement principle, and it may also be considered that the voice information analysis has failed.
According to the embodiment of the invention, after the acquired voice information is analyzed into the voice control command, firstly, the voice control command analyzed according to the acquired voice information is matched with lip language mouth shape information when a user sends the voice control command aiming at the intelligent home equipment, and if the voice control command analyzed according to the matching result does not have noise, the intelligent home equipment is controlled according to the analyzed voice control command; if the noise exists in the analyzed voice control command according to the matching result, the noise in the voice control command is filtered according to the voice word number information corresponding to the analyzed voice control command and the lip language mouth shape information, and finally the intelligent home equipment is controlled according to the voice control command after the noise is filtered, so that the accuracy of extracting the voice control command can be improved.
Based on the same inventive concept, the embodiment of the present invention further provides a device for controlling a device, and since the device corresponds to the device corresponding to the method for controlling a device provided in the embodiment of the present invention, and the principle of the device for solving the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 2, an apparatus for controlling a device according to a first embodiment of the present invention includes: at least one processing unit 200 and at least one storage unit 201, wherein the storage unit 201 stores program code that, when executed by the processing unit, causes the processing unit 200 to perform the following:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
and controlling the intelligent household equipment according to the first voice control command.
Optionally, the processing unit 200 is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
Optionally, the processing unit 200 is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
Optionally, the processing unit 200 is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the processing unit 200 is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the processing unit 200 is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
Optionally, the processing unit 200 is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
As shown in fig. 3, an apparatus for controlling a device according to a second embodiment of the present invention includes: determination module 300 and control module 301:
the determination module 300: the voice control device is used for matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
the control module 301: and the intelligent household equipment is controlled according to the first voice control command.
Optionally, the determining module 300 is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when a user sends a voice control command aiming at the intelligent household equipment is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
Optionally, the determining module 300 is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
Optionally, the determining module 300 is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the determining module 300 is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the determining module 300 is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
Optionally, the determining module 300 is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
As shown in fig. 4, a complete method for controlling a device according to an embodiment of the present invention includes the following steps:
and 411, controlling the intelligent household equipment according to the final voice control command.
The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Further, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with a command execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the command execution system, apparatus, or device.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (12)
1. A method of controlling a device, the method comprising:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
controlling the intelligent household equipment according to the first voice control command;
the determining the first voice control command according to the matching result includes:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
2. The method of claim 1, wherein the match result is determined by:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
3. The method of claim 1, wherein the determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information comprises:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
4. The method of claim 1, wherein the determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information comprises:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
5. The method of claim 3 or 4, wherein the information on the number of speech words corresponding to the determined speech control command is determined by:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
6. The method according to claim 3 or 4, wherein the mouth shape conversion times information and the alternative word information corresponding to the lip language mouth shape information are determined by:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
7. An apparatus for controlling a device, the apparatus comprising: at least one processing unit and at least one memory unit, wherein the memory unit stores program code that, when executed by the processing unit, causes the processing unit to perform the following:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
controlling the intelligent household equipment according to the first voice control command;
wherein the processing unit is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
8. The apparatus as claimed in claim 7, wherein said processing unit is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
9. The apparatus as claimed in claim 7, wherein said processing unit is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
10. The apparatus as claimed in claim 7, wherein said processing unit is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
11. The apparatus according to claim 9 or 10, wherein the processing unit is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
12. The apparatus according to claim 9 or 10, wherein the processing unit is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811381967.3A CN109377995B (en) | 2018-11-20 | 2018-11-20 | Method and device for controlling equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811381967.3A CN109377995B (en) | 2018-11-20 | 2018-11-20 | Method and device for controlling equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109377995A CN109377995A (en) | 2019-02-22 |
CN109377995B true CN109377995B (en) | 2021-06-01 |
Family
ID=65389650
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811381967.3A Active CN109377995B (en) | 2018-11-20 | 2018-11-20 | Method and device for controlling equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109377995B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276259B (en) * | 2019-05-21 | 2024-04-02 | 平安科技(深圳)有限公司 | Lip language identification method, device, computer equipment and storage medium |
CN110262278B (en) * | 2019-07-31 | 2020-12-11 | 珠海格力电器股份有限公司 | Control method and device of intelligent household electrical appliance and intelligent household electrical appliance |
CN111028842B (en) * | 2019-12-10 | 2021-05-11 | 上海芯翌智能科技有限公司 | Method and equipment for triggering voice interaction response |
CN111243585B (en) * | 2020-01-07 | 2022-11-22 | 百度在线网络技术(北京)有限公司 | Control method, device and equipment under multi-user scene and storage medium |
CN111309283B (en) * | 2020-03-25 | 2023-12-05 | 北京百度网讯科技有限公司 | Voice control method and device of user interface, electronic equipment and storage medium |
CN113763941A (en) * | 2020-06-01 | 2021-12-07 | 青岛海尔洗衣机有限公司 | Voice recognition method, voice recognition system and electrical equipment |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102324035A (en) * | 2011-08-19 | 2012-01-18 | 广东好帮手电子科技股份有限公司 | Method and system of applying lip posture assisted speech recognition technique to vehicle navigation |
CN102368198A (en) * | 2011-10-04 | 2012-03-07 | 上海量明科技发展有限公司 | Method and system for carrying out information cue through lip images |
CN103177238B (en) * | 2011-12-26 | 2019-01-15 | 宇龙计算机通信科技(深圳)有限公司 | Terminal and user identification method |
CN105096935B (en) * | 2014-05-06 | 2019-08-09 | 阿里巴巴集团控股有限公司 | A kind of pronunciation inputting method, device and system |
CN105703978A (en) * | 2014-11-24 | 2016-06-22 | 武汉物联远科技有限公司 | Smart home control system and method |
CN104409075B (en) * | 2014-11-28 | 2018-09-04 | 深圳创维-Rgb电子有限公司 | Audio recognition method and system |
CN106157957A (en) * | 2015-04-28 | 2016-11-23 | 中兴通讯股份有限公司 | Audio recognition method, device and subscriber equipment |
CN108346427A (en) * | 2018-02-05 | 2018-07-31 | 广东小天才科技有限公司 | A kind of audio recognition method, device, equipment and storage medium |
CN108428453A (en) * | 2018-03-27 | 2018-08-21 | 王凯 | A kind of intelligent terminal control system based on lip reading identification |
-
2018
- 2018-11-20 CN CN201811381967.3A patent/CN109377995B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109377995A (en) | 2019-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109377995B (en) | Method and device for controlling equipment | |
CN108447480B (en) | Intelligent household equipment control method, intelligent voice terminal and network equipment | |
CN105471705B (en) | Intelligent control method, equipment and system based on instant messaging | |
CN102779509B (en) | Voice processing equipment and voice processing method | |
CN112820291B (en) | Smart home control method, smart home control system and storage medium | |
CN109817211B (en) | Electric appliance control method and device, storage medium and electric appliance | |
CN105045140A (en) | Method and device for intelligently controlling controlled equipment | |
CN106847281A (en) | Intelligent household voice control system and method based on voice fuzzy identification technology | |
CN109065051B (en) | Voice recognition processing method and device | |
CN109343481B (en) | Method and device for controlling device | |
CN110579977B (en) | Control method and device of electrical equipment and computer readable storage medium | |
CN108932947B (en) | Voice control method and household appliance | |
CN109347708B (en) | Voice recognition method and device, household appliance, cloud server and medium | |
CN114582318B (en) | Intelligent home control method and system based on voice recognition | |
CN112002316A (en) | Electric appliance control method and device, storage medium and terminal | |
CN111583921A (en) | Voice control method, device, computer equipment and storage medium | |
CN114791771A (en) | Interaction management system and method for intelligent voice mouse | |
CN108415572B (en) | Module control method and device applied to mobile terminal and storage medium | |
CN113205809A (en) | Voice wake-up method and device | |
CN109976169B (en) | Internet television intelligent control method and system based on self-learning technology | |
WO2018023514A1 (en) | Home background music control system | |
US9626967B2 (en) | Information processing method and electronic device | |
CN116386623A (en) | Voice interaction method of intelligent equipment, storage medium and electronic device | |
CN110379422A (en) | Far field speech control system, control method and equipment under line | |
WO2018023518A1 (en) | Smart terminal for voice interaction and recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |