CN109377995B - Method and device for controlling equipment - Google Patents

Method and device for controlling equipment Download PDF

Info

Publication number
CN109377995B
CN109377995B CN201811381967.3A CN201811381967A CN109377995B CN 109377995 B CN109377995 B CN 109377995B CN 201811381967 A CN201811381967 A CN 201811381967A CN 109377995 B CN109377995 B CN 109377995B
Authority
CN
China
Prior art keywords
control command
voice control
mouth shape
voice
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811381967.3A
Other languages
Chinese (zh)
Other versions
CN109377995A (en
Inventor
韩雪
王慧君
毛跃辉
陶梦春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN201811381967.3A priority Critical patent/CN109377995B/en
Publication of CN109377995A publication Critical patent/CN109377995A/en
Application granted granted Critical
Publication of CN109377995B publication Critical patent/CN109377995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Automation & Control Theory (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Selective Calling Equipment (AREA)

Abstract

The invention discloses a method and a device for controlling equipment, which are used for solving the problem that the accuracy of a voice control command obtained by analysis is not high when intelligent household equipment is controlled in the prior art. The method comprises the steps of firstly matching a determined voice control command with lip language mouth shape information of a voice control command sent by a user and aiming at the intelligent household equipment, determining a first voice control command according to a matching result, and controlling the intelligent household equipment according to the first voice control command. The voice control command is matched with the lip language mouth shape when the user sends the voice control command aiming at the intelligent household equipment, and the voice control command for controlling the intelligent household equipment is determined according to the matching result, so that the accuracy of extracting the voice control command can be improved.

Description

Method and device for controlling equipment
Technical Field
The present invention relates to the field of wireless communications technologies, and in particular, to a method and an apparatus for controlling a device.
Background
The intelligent home is an ecosystem which takes a house as a platform and connects various devices in the house together through the Internet of things technology to realize intellectualization. The intelligent video interphone has the functions of intelligent light control, intelligent electric appliance control, a security monitoring system, intelligent background music, intelligent video sharing, a visual intercom system, a home theater system and the like.
The intelligent home integrates facilities related to home life by utilizing a comprehensive wiring technology, a network communication technology, a safety precaution technology, an automatic control technology and an audio and video technology, constructs an efficient management system for residential facilities and family schedule affairs, improves home safety, convenience, comfortableness and artistry, and realizes an environment-friendly and energy-saving living environment.
In the existing intelligent home environment, when a target user controls equipment in a functional furniture system through voice, an intelligent home system server directly analyzes and extracts a control voice command of the user from collected voice information containing the control voice command, and controls corresponding intelligent home equipment according to the determined control voice command. However, in daily home life, a home is usually configured with a plurality of smart home devices, and a home usually has a plurality of users, when a user controls a smart home device, other users may be controlling other smart home devices, and other users may also be talking, and at this time, voice information collected by the server is very complex, and if the voice command is directly extracted, the accuracy of analyzing and recognizing the user control voice command is not high due to noise interference.
In summary, when the smart home device is controlled, the accuracy of the user control voice command obtained through analysis is not high.
Disclosure of Invention
The invention provides a method and a device for controlling equipment, which are used for solving the problem that the accuracy of a voice control command obtained by analysis is not high when intelligent household equipment is controlled in the prior art.
In a first aspect, an embodiment of the present invention provides a method for controlling a device, where the method includes:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
and controlling the intelligent household equipment according to the first voice control command.
According to the method, firstly, the determined voice control command is matched with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, the first voice control command is determined according to the matching result, and the intelligent household equipment is controlled according to the first voice control command. The voice control command is matched with the lip language mouth shape when the user sends the voice control command aiming at the intelligent household equipment, and the voice control command for controlling the intelligent household equipment is determined according to the matching result, so that the accuracy of extracting the voice control command can be improved.
In one possible implementation, the matching result is determined by:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
The method provides a method for determining the existence of noise in the voice control command, and combines the lip language mouth shape information and the matching degree of the voice control command, so as to judge whether the noise exists in the voice control command more accurately.
In a possible implementation manner, the determining the first voice control command according to the matching result includes:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
The method determines the first voice control command according to the matching result, if no noise exists in the determined voice control command, the first voice control command is determined according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information, and if no noise exists in the determined voice control command, the determined voice control command is used as the first voice control command, so that the accuracy of analyzing the voice control command can be improved.
In a possible implementation manner, the determining a first voice control command according to the voice word number information and the lip language mouth shape information corresponding to the determined voice control command includes:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control instruction aiming at the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
The method comprises the steps of comparing voice digital information corresponding to the voice control command with the mouth shape conversion times corresponding to the lip language mouth shape of the intelligent household equipment by a user, discarding voice control commands of the lip language mouth shape which are not matched in the voice control command, and replacing the voice control command which cannot be matched with the lip language mouth shape information according to a replacement principle, so that noise in the voice control command is filtered by combining the lip language mouth shape information, and the accuracy of analyzing the voice control command is improved.
In a possible implementation manner, the determining a first voice control command according to the voice word number information and the lip language mouth shape information corresponding to the determined voice control command includes:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
According to the method, another method for filtering noise in the voice control command is provided, if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, the voice control command which cannot be matched with the lip language mouth shape information is replaced according to a replacement principle, so that the noise in the voice control command is filtered by combining the lip language mouth shape information, and the accuracy of analyzing the voice control command is improved.
In one possible implementation, the voice word number information corresponding to the determined voice control command is determined by:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
According to the method, the acquired voice information is analyzed according to the voice recognition model, and the voice recognition model is obtained through neural network training according to the voice information, the voice control command and the voice word number information, so that the voice control command and the voice word number information corresponding to the voice information can be analyzed.
In a possible implementation manner, the mouth shape transformation time information and the alternative word information corresponding to the lip language mouth shape information are determined by the following method:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
According to the method, the obtained lip language mouth shape information is analyzed according to the image recognition model, and the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information, so that the mouth shape transformation frequency and the replacement word information corresponding to the lip language mouth shape information can be obtained when the lip language mouth shape information is analyzed.
In a second aspect, an embodiment of the present invention provides an apparatus for controlling a device, where the apparatus includes: at least one processing unit and at least one memory unit, wherein the memory unit stores program code that, when executed by the processing unit, causes the processing unit to perform the following:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
and controlling the intelligent household equipment according to the first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
In a possible implementation manner, the processing unit is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control instruction aiming at the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
In a possible implementation manner, the processing unit is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
In a possible implementation manner, the processing unit is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
In a third aspect, an embodiment of the present invention provides an apparatus for controlling a device, where the apparatus includes:
a determination module: matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
a control module: and the intelligent household equipment is controlled according to the first voice control command.
In a fourth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method in the first aspect.
In addition, for technical effects brought by any one implementation manner of the second aspect to the fourth aspect, reference may be made to technical effects brought by different implementation manners of the first aspect, and details are not described here.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of a method for controlling a device according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a first control device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a second control device according to an embodiment of the present invention;
fig. 4 is a flowchart of a complete method for controlling a device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Along with the popularity of intelligent equipment, smart homes gradually enter the lives of people, when a user wants to control the smart homes by voice, the user needs to analyze collected voice information to obtain a voice control command, and then controls the smart homes according to the voice control command. When the collected voice information is analyzed, if only the voice information of the user aiming at the intelligent home equipment needing to be controlled exists in the voice information, the intelligent home equipment can be controlled according to the voice control command analyzed from the voice information, if other voice information exists in the voice information except the voice information of the user aiming at the intelligent home equipment needing to be controlled, for example, the voice information of other users aiming at other intelligent home equipment and the voice information of conversations among other users, when the collected voice information is analyzed, the analyzed voice control command is possibly inaccurate, and therefore the intelligent home equipment can be controlled wrongly through the voice control command.
For example, when the user a wants to open the air conditioner 1, the user a sends out voice information of "open the air conditioner", and at this time, the user B wants to control the air conditioner 2 to dehumidify, and the user B sends out voice information of "dehumidify", and when the voice information is collected, both the voices of "open the air conditioner" and "dehumidify" are collected, and then for the air conditioner 1, when the voice information is analyzed according to the collected voice information, the analyzed voice control command is "open the air conditioner, dehumidify", and at this time, the analyzed voice control command is an erroneous voice control command, which can also be understood as noise existing in the analyzed voice control command.
If noise exists in the voice control command analyzed according to the collected voice information, the noise in the voice control command needs to be filtered, and then the intelligent household equipment is controlled according to the filtered voice control command.
The execution subject in the embodiment of the present invention may be a server;
according to the embodiment of the invention, the voice information can be acquired through the microphone, and the lip language mouth shape information can be acquired through the camera.
The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems with the occurrence of a new service scenario.
In view of the foregoing application scenarios, an embodiment of the present invention provides a method for controlling a device, and as shown in fig. 1, the method specifically includes the following steps:
step 100, matching the determined voice control command with lip language mouth shape information when a user sends a voice control command for the intelligent household equipment, and determining a first voice control command according to a matching result;
step 101, controlling the intelligent household equipment according to a first voice control command.
In the embodiment of the invention, the determined voice control command is matched with lip language mouth shape information of the voice control command of the intelligent household equipment sent by a user, a first voice control command is determined according to a matching result, and the intelligent household equipment is controlled according to the first voice control command. The determined voice control command is matched with the lip language mouth shape information when the user sends the voice control command aiming at the intelligent household equipment, and the voice control command for controlling the intelligent household equipment is determined according to the matching result, so that the accuracy of extracting the voice control command can be improved.
In implementation, before determining the voice control command, if the microphone and the camera of the smart home device are in a sleep state, the user needs to wake up the microphone and the camera.
The method for waking up the microphone and the camera of the smart home device may be performed by using a wake-up word, for example, when a user sends a wake-up word of "air conditioner 1", the microphone and the camera connected to the air conditioner 1 are woken up; remote control wake-up may also be used, such as waking up the microphone and camera using a remote control.
After the microphone and the camera are awakened, the microphone can collect voice information, and the camera collects lip language mouth shape information of a user aiming at the intelligent household equipment.
It should be noted that, when the microphone collects the voice information, the collected voice information is the voice information that the microphone can recognize, for example, the user a sends out the voice information of "turning on the air conditioner 1", the user B sends out the voice information of "dehumidifying", and if the voice information microphones sent out by the user a and the user B can be collected, the collected voice information of the microphone at this time is "turning on the air conditioner" or "dehumidifying";
when the camera gathers lip language mouth shape information, what the collection was is the lip language mouth shape information of the intelligent household equipment that the user is directed against needs to control, for example, user A will control intelligent air conditioner 1, then user A need stand in the camera visual range who is connected with intelligent air conditioner 1, then sends speech information.
The following describes the speech information analysis and lip language mouth shape information analysis, respectively.
The microphone sends the voice information to the server after collecting the voice information, and the server analyzes the voice information after receiving the voice information.
Specifically, when analyzing the voice information, the obtained voice information may be analyzed according to a voice recognition model, where the voice recognition model is obtained by training through a neural network according to the voice information, the voice control command, and the voice word number information.
It should be noted that, the construction of the voice recognition model requires a large amount of voice information, voice control commands and voice word number information, the voice recognition model is obtained after the training of the neural network, and the voice control commands and the voice word number information corresponding to the voice information are obtained after the voice information is input into the voice recognition model.
And after analyzing the acquired voice information, the server obtains a voice control command and voice word number information corresponding to the voice information.
For example, the microphone transmits the collected voice information of "turning on the air conditioner 1" to the server, and the server analyzes the voice information after receiving the voice information, and obtains that the control command corresponding to the voice information is "on" and the number of words of voice information corresponding to the voice information is 2.
When the server analyzes the acquired voice information, there is a possibility that the analysis fails, for example, the voice information collected by a microphone connected to the air conditioner is "turn on a television", and when the server analyzes the voice information, the analysis fails, and the server may push a message that the voice analysis fails to the user.
The above is the analysis of the voice information by the server, and the following is the analysis of the lip language mouth shape information by the server.
After the lip language mouth shape information is collected by the camera, the lip language mouth shape information is sent to the server, and after the lip language mouth shape information is received by the server, the lip language mouth shape information is analyzed.
Specifically, when analyzing the lip language mouth shape information, the obtained lip language mouth shape information may be analyzed according to an image recognition model, where the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information, and the replacement word information.
It should be noted that, the construction of the image recognition model requires a large amount of lip language mouth shape information, mouth shape transformation frequency information, and replacement word information, the image recognition model is obtained after training of the neural network, and after the lip language mouth shape information is input into the image recognition model, the mouth shape transformation frequency information corresponding to the lip language mouth shape information and the replacement word information corresponding to the lip language mouth shape are obtained.
And after analyzing the obtained lip language mouth shape information, the server obtains mouth shape conversion frequency information and alternative word information corresponding to the lip language mouth shape information.
After the server acquires the voice information, firstly, the voice information is analyzed into a voice control command, if the analysis is successful, the voice control command acquired through the analysis is matched with the lip language mouth shape information acquired when a user sends the voice control command for the intelligent home equipment, a first voice control command is determined according to a matching result, and the intelligent home equipment is controlled according to the first voice control command.
In the implementation, the matching results are two, and if the matching degree of the first determined voice control command and the lip language mouth shape information when the user sends the voice control instruction aiming at the intelligent household equipment is less than a threshold value, the determined voice control command is determined to have noise;
and if the second matching degree is not less than the threshold value, determining that no noise exists in the voice control command.
For example, when the voice control command information analyzed by the server is ' turn on the air conditioner 1 ', and the voice control command for turning on the air conditioner 1 ' is matched with the obtained lip language mouth shape information of the user for the smart home device, the matching degree is 90%, and if the threshold value is 80%, it is determined that no noise exists in the voice control command;
for another example, when the voice control command information analyzed by the server is "turn on the air conditioner 1", and the voice control command for turning on the air conditioner 1 "is matched with the obtained lip language mouth shape information of the smart home device by the user, the matching degree is 70%, and if the threshold value is 80%, it is determined that noise exists in the voice control command.
If no noise exists in the determined voice control command, taking the voice control command as a first voice control command, namely controlling the intelligent household equipment according to the voice control command;
if noise exists in the determined voice control command, the voice control command with the noise filtered is used for controlling the intelligent home equipment, the noise is filtered according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information, and the voice control command with the noise filtered is used as a first voice control command.
In the embodiment of the present invention, two cases are used when noise is filtered for a voice control command with noise, where in the first case, the voice word number information corresponding to the voice control command is greater than the mouth shape conversion frequency information corresponding to the lip language mouth shape, and in the second case, the voice word number information corresponding to the voice control command is not greater than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the following respectively describes the two cases of filtering noise in the voice control command.
In case one, the voice word number information corresponding to the voice control command is larger than the mouth shape conversion times information corresponding to the lip language mouth shape information.
And if the voice word number information corresponding to the voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the voice control command.
For example, if the voice control command is "turn on the air conditioner a to dehumidify", the number of words of voice information corresponding to the voice control command is 7, if the number of times of mouth shape conversion corresponding to the lip language mouth shape information is 5, the number of words of voice information is greater than the voice mouth shape information, at this time, the voice control command needs to be matched with the lip language mouth shape information, the voice control command which is not matched with the lip language mouth shape information in the voice control command is discarded, if the voice control command which is not matched with the lip language mouth shape information is "dehumidify", the voice control command "dehumidify" is discarded, the voice control command which is discarded after the "dehumidify" is "turn on the air conditioner a", and the voice control command "turn on the air conditioner a" is the voice control command after the noise is filtered.
If the matching degree is not less than the threshold value, the second voice control command is used as a first voice control command, and the intelligent household equipment is controlled according to the first voice control command;
and if the matching degree of the second voice control information and the lip language mouth shape information is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information which cannot be matched with the control command according to a replacement principle to obtain a first voice control command, and controlling the intelligent home equipment according to the first voice control command.
And in the second situation, the voice word number information corresponding to the voice control command is not more than the mouth shape conversion frequency information corresponding to the lip language mouth shape information.
And if the voice word number information corresponding to the voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle.
It should be noted that the replacement principle is to determine whether the meaning of the voice control command to be replaced is similar to that of the replacement word, if so, the replacement is performed, and if not, the voice parsing fails.
The following illustrates how to filter noise in the voice control command when the voice word number information corresponding to the voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information.
For example, if the voice control command is "turn on the air conditioner a to increase", the voice word number information corresponding to the voice control command is 7, if the mouth shape conversion frequency corresponding to the lip language mouth shape information is 7, the voice word number information is equal to the voice mouth shape information, at this time, the voice control command needs to be matched with the lip language mouth shape information, if the voice control command "increase" is not matched with the lip language mouth shape information, correspondingly, a part of the lip language mouth shape information is not matched with the voice control command, a replacement word corresponding to the lip language mouth shape information which is not matched with the voice control command is replaced according to a replacement principle, for example, the replacement word is "dehumidification", and if the server determines that the meanings of "increase" and "dehumidification" are not similar, the voice control command is selected not to be replaced, and the voice information analysis fails;
the replacement word corresponding to the lip language mouth shape information which is not matched with the voice control instruction is 'heightening', the server judges that the meanings of 'heightening' and 'heightening' are similar, the 'heightening' is selected to be replaced by the 'heightening', the voice control instruction after replacement is 'air conditioner A heightening' opened, and namely the voice control instruction after noise filtering is 'air conditioner A heightening opened'.
The replacement word information is obtained according to a large number of experimental results and is stored in the server in advance.
It should be noted that the server determines whether the voice control command and the replacement word have similar meanings, and may determine according to some data stored in the server, for example, if the similarity of the meanings of "increase" and "increase" is 90%, the replacement may be performed.
It should be noted that, when comparing the voice word number information corresponding to the voice control command with the mouth shape conversion frequency information corresponding to the lip language mouth shape information, if the voice word number information corresponding to the voice control command is smaller than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, at this time, the voice word number information may be replaced according to a replacement principle, and it may also be considered that the voice information analysis has failed.
According to the embodiment of the invention, after the acquired voice information is analyzed into the voice control command, firstly, the voice control command analyzed according to the acquired voice information is matched with lip language mouth shape information when a user sends the voice control command aiming at the intelligent home equipment, and if the voice control command analyzed according to the matching result does not have noise, the intelligent home equipment is controlled according to the analyzed voice control command; if the noise exists in the analyzed voice control command according to the matching result, the noise in the voice control command is filtered according to the voice word number information corresponding to the analyzed voice control command and the lip language mouth shape information, and finally the intelligent home equipment is controlled according to the voice control command after the noise is filtered, so that the accuracy of extracting the voice control command can be improved.
Based on the same inventive concept, the embodiment of the present invention further provides a device for controlling a device, and since the device corresponds to the device corresponding to the method for controlling a device provided in the embodiment of the present invention, and the principle of the device for solving the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 2, an apparatus for controlling a device according to a first embodiment of the present invention includes: at least one processing unit 200 and at least one storage unit 201, wherein the storage unit 201 stores program code that, when executed by the processing unit, causes the processing unit 200 to perform the following:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
and controlling the intelligent household equipment according to the first voice control command.
Optionally, the processing unit 200 is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
Optionally, the processing unit 200 is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
Optionally, the processing unit 200 is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the processing unit 200 is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the processing unit 200 is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
Optionally, the processing unit 200 is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
As shown in fig. 3, an apparatus for controlling a device according to a second embodiment of the present invention includes: determination module 300 and control module 301:
the determination module 300: the voice control device is used for matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
the control module 301: and the intelligent household equipment is controlled according to the first voice control command.
Optionally, the determining module 300 is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when a user sends a voice control command aiming at the intelligent household equipment is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
Optionally, the determining module 300 is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
Optionally, the determining module 300 is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the determining module 300 is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
Optionally, the determining module 300 is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
Optionally, the determining module 300 is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
As shown in fig. 4, a complete method for controlling a device according to an embodiment of the present invention includes the following steps:
step 400, voice information and lip language mouth shape information of a user aiming at the intelligent household equipment are obtained;
step 401, analyzing voice information and lip language mouth shape information;
step 402, judging whether a voice control instruction is analyzed, if so, executing step 403, otherwise, exiting;
step 403, matching the voice control instruction with lip language mouth shape information;
step 404, judging whether the matching result is greater than a threshold value, if so, executing step 411, otherwise, executing step 405;
step 405, comparing the voice word number information obtained by analyzing the voice information with the mouth shape conversion frequency information obtained by analyzing the lip language mouth shape information;
step 406, determining whether the number information of the voice words is greater than the mouth shape conversion times information, if so, executing step 407, otherwise, executing step 408;
step 407, discarding the voice control command which is not matched with the lip language mouth shape information in the voice control command, and executing step 403;
step 408, judging whether the voice word number information is equal to the mouth shape conversion times information, if so, executing step 409, otherwise, exiting;
step 409, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle;
step 410, obtaining a final voice control command;
and 411, controlling the intelligent household equipment according to the final voice control command.
The present application is described above with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the application. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, and/or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the subject application may also be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Further, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with a command execution system. In the context of this application, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the command execution system, apparatus, or device.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (12)

1. A method of controlling a device, the method comprising:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
controlling the intelligent household equipment according to the first voice control command;
the determining the first voice control command according to the matching result includes:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
2. The method of claim 1, wherein the match result is determined by:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
3. The method of claim 1, wherein the determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information comprises:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
4. The method of claim 1, wherein the determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information comprises:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
5. The method of claim 3 or 4, wherein the information on the number of speech words corresponding to the determined speech control command is determined by:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
6. The method according to claim 3 or 4, wherein the mouth shape conversion times information and the alternative word information corresponding to the lip language mouth shape information are determined by:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
7. An apparatus for controlling a device, the apparatus comprising: at least one processing unit and at least one memory unit, wherein the memory unit stores program code that, when executed by the processing unit, causes the processing unit to perform the following:
matching the determined voice control command with lip language mouth shape information when a user sends the voice control command aiming at the intelligent household equipment, and determining a first voice control command according to a matching result;
controlling the intelligent household equipment according to the first voice control command;
wherein the processing unit is specifically configured to:
if the determined voice control command has noise, determining a first voice control command according to the voice word number information corresponding to the determined voice control command and the lip language mouth shape information;
and if the determined voice control command has no noise, taking the determined voice control command as the first voice control command.
8. The apparatus as claimed in claim 7, wherein said processing unit is specifically configured to:
judging whether the matching degree of the determined voice control command and lip language mouth shape information when the voice control command aiming at the intelligent household equipment is sent by the user is smaller than a threshold value or not, if so, determining that noise exists in the determined voice control command;
otherwise, determining that the determined voice control command is free of noise.
9. The apparatus as claimed in claim 7, wherein said processing unit is specifically configured to:
if the voice word number information corresponding to the determined voice control command is larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, discarding the voice control command which is not matched with the lip language mouth shape information in the determined voice control command to obtain a second voice control command;
and if the voice word number information corresponding to the second voice control command is equal to the mouth shape conversion frequency information corresponding to the lip language mouth shape information, and the matching degree of the second voice control command and the lip language mouth shape information when the voice control command for the intelligent household equipment is sent by the user is smaller than a threshold value, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
10. The apparatus as claimed in claim 7, wherein said processing unit is specifically configured to:
and if the voice word number information corresponding to the determined voice control command is not larger than the mouth shape conversion frequency information corresponding to the lip language mouth shape information, replacing the voice control command which cannot be matched with the lip language mouth shape information with replacement word information corresponding to the lip language mouth shape information according to a replacement principle to obtain a first voice control command.
11. The apparatus according to claim 9 or 10, wherein the processing unit is specifically configured to:
and analyzing the acquired voice information according to a voice recognition model, wherein the voice recognition model is obtained by training a neural network according to the voice information, the voice control command and the voice word number information.
12. The apparatus according to claim 9 or 10, wherein the processing unit is specifically configured to:
and analyzing the obtained lip language mouth shape information according to an image recognition model, wherein the image recognition model is obtained through neural network training according to the lip language mouth shape information, the mouth shape transformation frequency information and the replacement word information.
CN201811381967.3A 2018-11-20 2018-11-20 Method and device for controlling equipment Active CN109377995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811381967.3A CN109377995B (en) 2018-11-20 2018-11-20 Method and device for controlling equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811381967.3A CN109377995B (en) 2018-11-20 2018-11-20 Method and device for controlling equipment

Publications (2)

Publication Number Publication Date
CN109377995A CN109377995A (en) 2019-02-22
CN109377995B true CN109377995B (en) 2021-06-01

Family

ID=65389650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811381967.3A Active CN109377995B (en) 2018-11-20 2018-11-20 Method and device for controlling equipment

Country Status (1)

Country Link
CN (1) CN109377995B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276259B (en) * 2019-05-21 2024-04-02 平安科技(深圳)有限公司 Lip language identification method, device, computer equipment and storage medium
CN110262278B (en) * 2019-07-31 2020-12-11 珠海格力电器股份有限公司 Control method and device of intelligent household electrical appliance and intelligent household electrical appliance
CN111028842B (en) * 2019-12-10 2021-05-11 上海芯翌智能科技有限公司 Method and equipment for triggering voice interaction response
CN111243585B (en) * 2020-01-07 2022-11-22 百度在线网络技术(北京)有限公司 Control method, device and equipment under multi-user scene and storage medium
CN111309283B (en) * 2020-03-25 2023-12-05 北京百度网讯科技有限公司 Voice control method and device of user interface, electronic equipment and storage medium
CN113763941A (en) * 2020-06-01 2021-12-07 青岛海尔洗衣机有限公司 Voice recognition method, voice recognition system and electrical equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324035A (en) * 2011-08-19 2012-01-18 广东好帮手电子科技股份有限公司 Method and system of applying lip posture assisted speech recognition technique to vehicle navigation
CN102368198A (en) * 2011-10-04 2012-03-07 上海量明科技发展有限公司 Method and system for carrying out information cue through lip images
CN103177238B (en) * 2011-12-26 2019-01-15 宇龙计算机通信科技(深圳)有限公司 Terminal and user identification method
CN105096935B (en) * 2014-05-06 2019-08-09 阿里巴巴集团控股有限公司 A kind of pronunciation inputting method, device and system
CN105703978A (en) * 2014-11-24 2016-06-22 武汉物联远科技有限公司 Smart home control system and method
CN104409075B (en) * 2014-11-28 2018-09-04 深圳创维-Rgb电子有限公司 Audio recognition method and system
CN106157957A (en) * 2015-04-28 2016-11-23 中兴通讯股份有限公司 Audio recognition method, device and subscriber equipment
CN108346427A (en) * 2018-02-05 2018-07-31 广东小天才科技有限公司 A kind of audio recognition method, device, equipment and storage medium
CN108428453A (en) * 2018-03-27 2018-08-21 王凯 A kind of intelligent terminal control system based on lip reading identification

Also Published As

Publication number Publication date
CN109377995A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
CN109377995B (en) Method and device for controlling equipment
CN108447480B (en) Intelligent household equipment control method, intelligent voice terminal and network equipment
CN105471705B (en) Intelligent control method, equipment and system based on instant messaging
CN102779509B (en) Voice processing equipment and voice processing method
CN112820291B (en) Smart home control method, smart home control system and storage medium
CN109817211B (en) Electric appliance control method and device, storage medium and electric appliance
CN105045140A (en) Method and device for intelligently controlling controlled equipment
CN106847281A (en) Intelligent household voice control system and method based on voice fuzzy identification technology
CN109065051B (en) Voice recognition processing method and device
CN109343481B (en) Method and device for controlling device
CN110579977B (en) Control method and device of electrical equipment and computer readable storage medium
CN108932947B (en) Voice control method and household appliance
CN109347708B (en) Voice recognition method and device, household appliance, cloud server and medium
CN114582318B (en) Intelligent home control method and system based on voice recognition
CN112002316A (en) Electric appliance control method and device, storage medium and terminal
CN111583921A (en) Voice control method, device, computer equipment and storage medium
CN114791771A (en) Interaction management system and method for intelligent voice mouse
CN108415572B (en) Module control method and device applied to mobile terminal and storage medium
CN113205809A (en) Voice wake-up method and device
CN109976169B (en) Internet television intelligent control method and system based on self-learning technology
WO2018023514A1 (en) Home background music control system
US9626967B2 (en) Information processing method and electronic device
CN116386623A (en) Voice interaction method of intelligent equipment, storage medium and electronic device
CN110379422A (en) Far field speech control system, control method and equipment under line
WO2018023518A1 (en) Smart terminal for voice interaction and recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant