CN113689858B - Control method and device of cooking equipment, electronic equipment and storage medium - Google Patents

Control method and device of cooking equipment, electronic equipment and storage medium Download PDF

Info

Publication number
CN113689858B
CN113689858B CN202110963243.5A CN202110963243A CN113689858B CN 113689858 B CN113689858 B CN 113689858B CN 202110963243 A CN202110963243 A CN 202110963243A CN 113689858 B CN113689858 B CN 113689858B
Authority
CN
China
Prior art keywords
cooking
command word
control
voice
cooking equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110963243.5A
Other languages
Chinese (zh)
Other versions
CN113689858A (en
Inventor
胡子坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Midea Group Co Ltd
Guangdong Midea Kitchen Appliances Manufacturing Co Ltd
Original Assignee
Midea Group Co Ltd
Guangdong Midea Kitchen Appliances Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Midea Group Co Ltd, Guangdong Midea Kitchen Appliances Manufacturing Co Ltd filed Critical Midea Group Co Ltd
Priority to CN202110963243.5A priority Critical patent/CN113689858B/en
Publication of CN113689858A publication Critical patent/CN113689858A/en
Application granted granted Critical
Publication of CN113689858B publication Critical patent/CN113689858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • AHUMAN NECESSITIES
    • A47FURNITURE; DOMESTIC ARTICLES OR APPLIANCES; COFFEE MILLS; SPICE MILLS; SUCTION CLEANERS IN GENERAL
    • A47JKITCHEN EQUIPMENT; COFFEE MILLS; SPICE MILLS; APPARATUS FOR MAKING BEVERAGES
    • A47J36/00Parts, details or accessories of cooking-vessels
    • A47J36/32Time-controlled igniting mechanisms or alarm devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Food Science & Technology (AREA)
  • Electric Ovens (AREA)

Abstract

The application relates to the technical field of household appliances, in particular to a control method and device of cooking equipment, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting voice data and user images of a user; when the confidence coefficient of the voice data is detected to be smaller than a first preset threshold value, extracting action features based on the user image; and determining a control instruction of the cooking equipment according to the voice data and the action characteristics so as to control the cooking equipment to execute corresponding cooking actions. Therefore, the problem that the cooking equipment cannot be controlled due to incorrect voice recognition or difficult recognition in the related art is solved, the control operation of the cooking equipment is realized, the voice recognition is more accurate, the voice recognition effect is greatly improved, and the user experience is improved.

Description

Control method and device of cooking equipment, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of home appliances, and in particular, to a control method and apparatus for a cooking device, an electronic device, and a storage medium.
Background
Currently, most cooking apparatuses are provided with a voice recognition function, but if the surrounding environment is too large, the voice recognition experience may be poor.
In the related art, the voice characteristics are extracted by noise reduction processing of the collected voice signals, so that the voice recognition accuracy is improved.
However, when the signal-to-noise ratio is low or the collected sound is small, there is still a problem that the cooking apparatus cannot be controlled due to erroneous recognition or difficulty in recognition.
Content of the application
The application provides a control method, a control device, electronic equipment and a storage medium of cooking equipment, which are used for solving the problem that the cooking equipment cannot be controlled due to voice recognition errors or difficult recognition in the related technology, realizing control operation on the cooking equipment, enabling the voice recognition to be more accurate, greatly improving the recognition effect of the voice and improving the user experience.
An embodiment of a first aspect of the present application provides a control method of a cooking apparatus, including the steps of:
collecting voice data and user images of a user;
extracting lip features based on the user image when the confidence level of the voice data is detected to be smaller than a first preset threshold value; and
and determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the method further comprises:
when the confidence coefficient of the voice data is detected to be larger than or equal to a first preset threshold value, determining a control instruction of the cooking equipment according to the voice data so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the method further comprises:
collecting environmental noise data;
if the environmental noise data is smaller than a second preset threshold value, determining a control instruction of the cooking equipment according to the voice data so as to control the cooking equipment to execute corresponding cooking actions;
otherwise, determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the determining the control instruction of the cooking device according to the voice data and the lip feature includes:
recognizing a voice command word based on the voice data, and obtaining a lip language recognition result based on the lip feature;
obtaining the matching degree of the voice command word and the lip language recognition result;
if the command word belongs to a first command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a third preset threshold value;
if the command word belongs to a second command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a fourth preset threshold value; wherein the third preset threshold is greater than the fourth preset threshold.
Optionally, the method further comprises:
and if the matching degree is smaller than a fourth preset threshold value, performing control failure prompt on the user.
Optionally, the method further comprises:
after control failure prompt is carried out, acquiring limb data of a user;
and extracting action characteristics according to the limb data, and controlling the cooking equipment to execute corresponding cooking actions according to the control instructions determined by the action characteristics.
Optionally, the limb data comprises gesture data.
An embodiment of a second aspect of the present application provides a control device of a cooking apparatus, including:
the first acquisition module is used for acquiring voice data of a user and a user image;
the extraction module is used for extracting lip features based on the user image when the confidence coefficient of the voice data is detected to be smaller than a first preset threshold value; and
and the first control module is used for determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the method further comprises:
and the second control module is used for determining a control instruction of the cooking equipment according to the voice data when the confidence coefficient of the voice data is detected to be larger than or equal to a first preset threshold value so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the method further comprises:
the second acquisition module is used for acquiring environmental noise data;
the third control module is used for determining a control instruction of the cooking equipment according to the voice data if the environmental noise data is smaller than a second preset threshold value so as to control the cooking equipment to execute corresponding cooking actions; otherwise, determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the first control module is specifically configured to:
recognizing a voice command word based on the voice data, and obtaining a lip language recognition result based on the lip feature;
obtaining the matching degree of the voice command word and the lip language recognition result;
if the command word belongs to a first command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a third preset threshold value;
if the command word belongs to a second command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a fourth preset threshold value; wherein the third preset threshold is greater than the fourth preset threshold.
Optionally, the method further comprises:
and the prompting module is used for prompting the control failure of the user if the matching degree is smaller than a fourth preset threshold value.
Optionally, the method further comprises:
the third acquisition module is used for acquiring limb data of a user after control failure prompt is carried out;
and the fourth control module is used for extracting action characteristics according to the limb data and controlling the cooking equipment to execute corresponding cooking actions according to the control instructions determined by the action characteristics.
Optionally, the limb data comprises gesture data.
An embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the control method of the cooking apparatus as described in the above embodiments.
An embodiment of a fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program that is executed by a processor for implementing the control method of the cooking apparatus as described in the above embodiment.
Therefore, the voice data of the user and the environmental noise data can be collected through the microphone based on the microphone and the camera installed on the cooking equipment, the camera is used for collecting the user image, and when the voice data confidence and/or the environmental noise data confidence do not meet the corresponding conditions, the voice data of the user can be identified in an auxiliary mode based on the user image so as to control the cooking equipment. Therefore, the problem that the cooking equipment cannot be controlled due to incorrect voice recognition or difficult recognition in the related art is solved, the control operation of the cooking equipment is realized, the voice recognition is more accurate, the voice recognition effect is greatly improved, and the user experience is improved.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a flowchart of a control method of a cooking apparatus according to an embodiment of the present application;
fig. 2 is a flowchart of a control method of a cooking apparatus according to an embodiment of the present application;
fig. 3 is an exemplary diagram of a control device of a cooking apparatus according to an embodiment of the present application;
fig. 4 is an exemplary diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout.
Control methods and devices of cooking devices, electronic devices and storage media of embodiments of the present application are described below with reference to the accompanying drawings. In order to solve the problem that in the related art mentioned in the background center, due to incorrect voice recognition or difficulty in recognition, the cooking equipment cannot be controlled, the application provides a control method of the cooking equipment, in the method, voice data of a user and environmental noise data can be collected through a microphone based on a microphone and a camera installed on the cooking equipment, a user image is collected through the camera, and when the voice data confidence and/or the environmental noise data confidence do not meet corresponding conditions, the voice data of the user is identified in an auxiliary mode based on the user image so as to control the cooking equipment. Therefore, the problem that the cooking equipment cannot be controlled due to incorrect voice recognition or difficult recognition in the related art is solved, the control operation of the cooking equipment is realized, the voice recognition is more accurate, the voice recognition effect is greatly improved, and the user experience is improved.
Specifically, fig. 1 is a schematic flow chart of a control method of a cooking device according to an embodiment of the present application.
As shown in fig. 1, the control method of the cooking apparatus includes the steps of:
in step S101, voice data of a user and a user image are acquired.
It should be understood that the cooking apparatus according to the embodiments of the present application may be equipped with a camera and a microphone, and may collect an image of a user through the camera and collect voice data of the user through the microphone. The cooking device can be provided with a plurality of microphones, and the plurality of microphones can be arranged on different sides of the cooking device at least, so that a plurality of voice signals acquired by the microphones on multiple sides can be acquired.
In step S102, when the confidence of the detected voice data is smaller than the first preset threshold, lip features are extracted based on the user image.
The first preset threshold may be a threshold preset by a user, may be a threshold obtained through limited experiments, or may be a threshold obtained through limited computer simulation. The user image is an image of lips including a human face, and can acquire the lip image while acquiring voice data.
Specifically, after voice data (including command words) for controlling the cooking device is collected, the embodiment of the application may compare the command words with command words in pre-stored voice data through the voice recognition system, calculate the confidence coefficient of the voice data, and obtain lip features when the confidence coefficient of the voice data is lower than a first preset threshold, so that in order to better recognize the lip shape change, the lip features may also include images of other parts of the face, which is due to the fact that sometimes the lip shape change is related to the facial expression change.
In step S103, a control instruction of the cooking apparatus is determined according to the voice data and the lip feature, so as to control the cooking apparatus to perform a corresponding cooking action.
Optionally, determining a control instruction of the cooking device according to the voice data and the lip feature includes: recognizing a voice command word based on voice data, and obtaining a lip recognition result based on lip characteristics; obtaining the matching degree of the voice command word and the lip language recognition result; if the voice command word belongs to the first command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language recognition result when the matching degree is greater than or equal to a third preset threshold value; if the command word belongs to the second command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip recognition result when the matching degree is larger than or equal to a fourth preset threshold value; wherein the third preset threshold is greater than the fourth preset threshold.
Optionally, in some embodiments, further comprising: and if the matching degree is smaller than a fourth preset threshold value, performing control failure prompt on the user.
The first command word database is a database of operation command words related to cooking, for example, heating for XX minutes, delaying for XX minutes, etc., the second command word database is a database of command words with entertainment, for example, playing music, telling stories, etc., and the third preset threshold and the fourth preset threshold may be thresholds preset by a user, may be thresholds obtained through limited experiments, may be thresholds obtained through limited computer simulation, and are not limited specifically herein.
Specifically, the embodiment of the application can recognize the voice command word based on the voice data, and perform lip feature based on the user image to obtain the lip recognition result. When the lip language identification is performed on the user image, the lip feature can be extracted from the user image, and the preset database in the embodiment of the application can store the mapping relation between the action feature and the lip feature, and after the lip feature is obtained from the user image, the action feature corresponding to the lip feature can be obtained by inquiring the mapping relation.
As a possible implementation manner, when extracting lip features, the embodiment of the application may extract the lip features of the user image by using a contour feature extraction method, so as to obtain a feature extraction result.
The embodiment of the application may employ LPCC ((Linear Prediction Cepstrum Coefficient, linear prediction cepstral coefficient), MFCC (Mel Frequency Cepstrum Coefficient, mel frequency cepstral coefficient), HMM (Hidden Markov Model ) and DTW (Dynamic Time Warping, dynamic time warping) to perform feature extraction on the speech signal.
Further, after the voice command word and the lip recognition result are obtained, the lip recognition result and the voice command word may be matched, for example, when the user sends a voice command of "heating for ten minutes" and recognizes the voice command word according to the voice data of the user, the recognition result is uncertain as "heating for four minutes" or "heating for ten minutes", and in order to ensure the accuracy of control, the embodiment of the present application may further combine the lip feature to perform auxiliary recognition. Since the heating command is an operation command word related to cooking, the command word belongs to the first command word database, the third preset threshold may be set to 0.7, according to the embodiment of the application, the lip recognition result is obtained based on the lip feature to be "heating ten minutes", the matching degree obtained according to the voice command word "heating four minutes" and the lip recognition result "heating ten minutes" is 0.5, the third preset threshold is not exceeded, the matching degree obtained according to the voice command word "heating ten minutes" and the lip recognition result "heating ten minutes" is 0.9, and the third preset threshold is exceeded, so that the control command of the cooking device can be determined to be "heating ten minutes" according to the voice command word and the lip recognition result, and the embodiment of the application can control the cooking device according to the control command of "heating 10 minutes", for example, control the cooking device to heat for 10 minutes.
For another example, when the user sends a voice command to "play XXX music" and recognizes a voice command word according to the voice data of the user, only the "play" two words are recognized, and whether the voice command word is music or story is uncertain is determined, and the embodiment of the application can further combine lip features to perform auxiliary recognition. Because the heating command is an operation command word related to entertainment, the command word belongs to a second command word database, and the matching degree is not required to be too high when the command word related to entertainment is matched, so that the fourth preset threshold value can be set to be lower than the third preset threshold value, for example, to be 0.4. It should be noted that, when the matching degree is greater than the fourth preset threshold, but the specific playing song of the user is not accurately identified, the embodiment of the present application may play the song played before the user, or match the song with the highest playing frequency of the user according to the user image, and specifically may be set according to the actual situation, which is not limited herein specifically.
Therefore, the embodiment of the application can be used for assisting the voice recognition through the recognition result of the user image, so that the accurate control of the cooking equipment is improved.
Optionally, in some embodiments, the control method of the cooking apparatus further includes: when the confidence coefficient of the voice data is detected to be larger than or equal to a first preset threshold value, determining a control instruction of the cooking equipment according to the voice data so as to control the cooking equipment to execute corresponding cooking actions.
It should be understood that if the confidence coefficient of the voice data is greater than the first preset threshold, it is explained that the embodiment of the present application may accurately identify the voice command of the user, and the embodiment of the present application may directly identify the corresponding voice command word according to the voice data, and determine the control instruction of the cooking device according to the command word, so as to further control the cooking device to execute the corresponding cooking action.
Optionally, in some embodiments, the control method of the cooking apparatus further includes: collecting environmental noise data; if the environmental noise data is smaller than a second preset threshold value, determining a control instruction of the cooking equipment according to the voice data so as to control the cooking equipment to execute corresponding cooking actions; otherwise, determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions.
In this embodiment, the microphone may collect the environmental noise data, where the environmental noise data may include only environmental voice, for example, the voice signal may be separated from the target voice signal based on the target voice signal obtained in the above embodiment, and the voice signal is used as the environmental voice, that is, the environmental noise data in this embodiment.
In addition, the embodiment of the present application may also store voiceprint features in advance, separate a voice signal corresponding to the voiceprint features from a voice signal collected by the microphone, and use other voice signals as environmental human voice, that is, environmental noise data in this embodiment.
It should be understood that if the environmental noise data is smaller than the second preset threshold, it indicates that the environmental noise is smaller, and there is almost no influence on the recognition of the voice data, so that when the environmental noise data is smaller than the second preset threshold, the embodiment of the application can ignore the influence of the environmental noise, determine the control instruction of the cooking device according to the voice data, so as to control the cooking device to execute the corresponding cooking action.
For example, assume that the first preset threshold is 0.8 and the second preset threshold is 0.3. The user speaks "heat for 5 minutes" by voice, collects the voice of the user by the microphone and calculates the confidence coefficient of the voice data to be 0.9, the ambient noise data to be 0.2, and the cooking apparatus responds to the instruction of "heat for 5 minutes" by comprehensively judging that the confidence coefficient of the voice data is higher than the first preset threshold and the confidence coefficient of the ambient noise data is lower than the second preset threshold.
Further, when the environmental noise data is greater than or equal to the second preset threshold, it indicates that the environmental noise is relatively large, and a certain influence can be generated on the recognition of the voice data.
For example, assume that the first preset threshold is 0.8, the second preset threshold is 0.3, and the third preset threshold is 0.8. The user speaks the "heating for 10 minutes" through voice, and because the environmental noise data is 0.5 and is greater than the second preset threshold value, the cooking device can match the command word recognized by the voice data with the lip recognition result obtained by the user image, and through comprehensive judgment, the matching value is 0.9 and is greater than the third preset threshold value and is 0.8, and the common recognition result based on the voice data and the user image is the "heating for 10 minutes", so that the cooking device responds to the instruction of the "heating for 10 minutes".
Optionally, in some embodiments, the control method of the cooking apparatus further includes: after control failure prompt is carried out, acquiring limb data of a user; and extracting action characteristics according to the limb data, and controlling the cooking equipment to execute corresponding cooking actions according to the control instructions determined by the action characteristics.
Optionally, the limb data comprises gesture data.
It should be appreciated that if the user speaks insufficiently fluently, the situation that the confidence of the voice data and the confidence of the user image are smaller than the corresponding preset thresholds is easy to occur, so that in order to further improve the accuracy of recognition, in the embodiment of the application, the user can recognize through the limb data after the failure prompt occurs, wherein the limb data can be gesture data. The gesture data may include a motion feature that moves from left to right, or right to left, or top to bottom, or bottom to top, or in a circle or half circle.
Specifically, the control instruction corresponding to the motion feature of the gesture data may be preset, for example, the control instruction corresponding to the movement from left to right is heated for five minutes, the control instruction corresponding to the movement from right to left is heated for ten minutes, and no specific limitation is made here, so that the corresponding control instruction may be determined according to the motion feature of the gesture data, and further the cooking device may be controlled accordingly according to the control instruction.
In order to enable those skilled in the art to further understand the control method of the cooking apparatus according to the embodiments of the present application, the following details are described in connection with specific embodiments.
As shown in fig. 2, the control method of the cooking apparatus includes the steps of:
s201, collecting voice and video to obtain voice data, environment noise data and user images.
S202, recognizing voice command words based on voice data, and obtaining lip language recognition results based on user images.
S203, combining the environmental noise data, the recognition result of the voice command word or the matching degree of the recognition result of the voice command word and the lip language recognition result, and comprehensively calculating to obtain the control instruction of the cooking equipment.
S204, controlling the cooking equipment to execute corresponding actions according to the control instruction.
It can be known that, when the collected voice data of the user, the confidence coefficient of the corresponding voice data and the voice command word are obtained through calculation, meanwhile, the lip recognition result is obtained through the mouth shape of the speaker, when the environmental noise data is lower than a second preset threshold value, if the confidence coefficient of the voice data is higher than the preset command word confidence coefficient threshold value (namely, a first preset threshold value), the command word is considered to be credible; when the environmental noise data is larger than or equal to a second preset threshold value, the environmental noise is larger, further judgment is needed to be carried out by further combining with the user image, a matching value is obtained by matching the voice command word with the lip language recognition result, after the corresponding preset threshold value is determined according to the command word database where the voice command word is located, the corresponding preset threshold value is compared with the corresponding preset threshold value according to the matching value, after the matching value is larger than the corresponding preset threshold value, a control instruction of the cooking equipment is determined, and the cooking equipment is controlled to execute corresponding cooking actions.
According to the control method of the cooking equipment, provided by the embodiment of the application, the voice data of the user and the environmental noise data can be collected through the microphone based on the microphone and the camera installed on the cooking equipment, the camera is used for collecting the user image, and when the voice data confidence and/or the environmental noise data confidence do not meet the corresponding conditions, the voice data of the user are identified in an auxiliary mode based on the user image so as to control the cooking equipment. Therefore, the problem that the cooking equipment cannot be controlled due to incorrect voice recognition or difficult recognition in the related art is solved, the control operation of the cooking equipment is realized, the voice recognition is more accurate, the voice recognition effect is greatly improved, and the user experience is improved.
Next, a control device of a cooking apparatus according to an embodiment of the present application will be described with reference to the accompanying drawings.
Fig. 3 is a block schematic diagram of a control device of a cooking apparatus according to an embodiment of the present application.
As shown in fig. 3, the control device 10 of the cooking apparatus includes: the device comprises a first acquisition module 100, an extraction module 200 and a first control module 300.
The first acquisition module 100 is used for acquiring voice data and user images of a user;
the extraction module 200 is configured to extract lip features based on the user image when the confidence level of the detected voice data is less than a first preset threshold; and
the first control module 300 is configured to determine a control instruction of the cooking apparatus according to the voice data and the lip feature, so as to control the cooking apparatus to perform a corresponding cooking action.
Optionally, the method further comprises:
and the second control module is used for determining a control instruction of the cooking equipment according to the voice data when the confidence coefficient of the voice data is detected to be greater than or equal to a first preset threshold value so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the method further comprises:
the second acquisition module is used for acquiring environmental noise data;
the third control module is used for determining a control instruction of the cooking equipment according to the voice data if the environmental noise data is smaller than a second preset threshold value so as to control the cooking equipment to execute corresponding cooking actions; otherwise, determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions.
Optionally, the first control module is specifically configured to:
recognizing a voice command word based on voice data, and obtaining a lip recognition result based on lip characteristics;
obtaining the matching degree of the voice command word and the lip language recognition result;
if the command word belongs to the first command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip recognition result when the matching degree is larger than or equal to a third preset threshold value;
if the command word belongs to the second command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip recognition result when the matching degree is larger than or equal to a fourth preset threshold value; wherein the third preset threshold is greater than the fourth preset threshold.
Optionally, the method further comprises:
and the prompting module is used for prompting the control failure of the user if the matching degree is smaller than a fourth preset threshold value.
Optionally, the method further comprises:
the third acquisition module is used for acquiring limb data of a user after control failure prompt is carried out;
and the fourth control module is used for extracting action characteristics according to the limb data and controlling the cooking equipment to execute corresponding cooking actions according to the control instructions determined by the action characteristics.
Optionally, the limb data comprises gesture data.
It should be noted that the foregoing explanation of the embodiment of the control method of the cooking apparatus is also applicable to the control device of the cooking apparatus of this embodiment, and will not be repeated here.
According to the control device of the cooking equipment, the microphone and the camera installed on the cooking equipment can be used for collecting voice data of a user and environment noise data through the microphone, the camera is used for collecting user images, and when the voice data confidence level and/or the environment noise data confidence level do not meet corresponding conditions, the voice data of the user can be identified in an auxiliary mode based on the user images so as to control the cooking equipment. Therefore, the problem that the cooking equipment cannot be controlled due to incorrect voice recognition or difficult recognition in the related art is solved, the control operation of the cooking equipment is realized, the voice recognition is more accurate, the voice recognition effect is greatly improved, and the user experience is improved.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 401, processor 402, and a computer program stored on memory 401 and executable on processor 402.
The processor 402 implements the control method of the cooking apparatus provided in the above-described embodiment when executing a program.
Further, the electronic device further includes:
a communication interface 403 for communication between the memory 401 and the processor 402.
A memory 401 for storing a computer program executable on the processor 402.
Memory 401 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 401, the processor 402, and the communication interface 403 are implemented independently, the communication interface 403, the memory 401, and the processor 402 may be connected to each other by a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Peripheral Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 401, the processor 402, and the communication interface 403 are integrated on a chip, the memory 401, the processor 402, and the communication interface 403 may perform communication with each other through internal interfaces.
The processor 402 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the control method of a cooking apparatus as above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, such as a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (9)

1. A control method of a cooking apparatus, comprising the steps of:
collecting voice data and user images of a user;
extracting lip features based on the user image when the confidence level of the voice data is detected to be smaller than a first preset threshold value; and
determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions;
the determining the control instruction of the cooking device according to the voice data and the lip feature comprises the following steps:
recognizing a voice command word based on the voice data, and obtaining a lip language recognition result based on the lip feature;
obtaining the matching degree of the voice command word and the lip language recognition result;
if the command word belongs to a first command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a third preset threshold value;
if the command word belongs to a second command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a fourth preset threshold value; wherein the third preset threshold is greater than the fourth preset threshold;
the first command word database is a database formed by operation command words related to cooking,
the second command word database is a database composed of command words with entertainment.
2. The method as recited in claim 1, further comprising:
when the confidence coefficient of the voice data is detected to be larger than or equal to a first preset threshold value, determining a control instruction of the cooking equipment according to the voice data so as to control the cooking equipment to execute corresponding cooking actions.
3. The method as recited in claim 1, further comprising:
collecting environmental noise data;
if the environmental noise data is smaller than a second preset threshold value, determining a control instruction of the cooking equipment according to the voice data so as to control the cooking equipment to execute corresponding cooking actions;
otherwise, determining a control instruction of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions.
4. The method as recited in claim 1, further comprising:
and if the matching degree is smaller than a fourth preset threshold value, performing control failure prompt on the user.
5. The method as recited in claim 4, further comprising:
after control failure prompt is carried out, acquiring limb data of a user;
and extracting action characteristics according to the limb data, and controlling the cooking equipment to execute corresponding cooking actions according to the control instructions determined by the action characteristics.
6. The method of claim 5, wherein the limb data comprises gesture data.
7. A control method of a cooking apparatus, comprising the steps of:
the first acquisition module is used for acquiring voice data of a user and a user image;
the extraction module is used for extracting lip features based on the user image when the confidence coefficient of the voice data is detected to be smaller than a first preset threshold value; and
the first control module is used for determining control instructions of the cooking equipment according to the voice data and the lip characteristics so as to control the cooking equipment to execute corresponding cooking actions;
the determining the control instruction of the cooking device according to the voice data and the lip feature comprises the following steps:
recognizing a voice command word based on the voice data, and obtaining a lip language recognition result based on the lip feature;
obtaining the matching degree of the voice command word and the lip language recognition result;
if the command word belongs to a first command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a third preset threshold value;
if the command word belongs to a second command word database, determining a control instruction of the cooking equipment according to the voice command word and the lip language identification result when the matching degree is larger than or equal to a fourth preset threshold value; wherein the third preset threshold is greater than the fourth preset threshold;
the first command word database is a database formed by operation command words related to cooking,
the second command word database is a database composed of command words with entertainment.
8. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the control method of a cooking apparatus according to any one of claims 1-6.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for realizing the control method of a cooking apparatus according to any one of claims 1-6.
CN202110963243.5A 2021-08-20 2021-08-20 Control method and device of cooking equipment, electronic equipment and storage medium Active CN113689858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110963243.5A CN113689858B (en) 2021-08-20 2021-08-20 Control method and device of cooking equipment, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110963243.5A CN113689858B (en) 2021-08-20 2021-08-20 Control method and device of cooking equipment, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113689858A CN113689858A (en) 2021-11-23
CN113689858B true CN113689858B (en) 2024-01-05

Family

ID=78581093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110963243.5A Active CN113689858B (en) 2021-08-20 2021-08-20 Control method and device of cooking equipment, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113689858B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107799125A (en) * 2017-11-09 2018-03-13 维沃移动通信有限公司 A kind of audio recognition method, mobile terminal and computer-readable recording medium
CN111028842A (en) * 2019-12-10 2020-04-17 上海芯翌智能科技有限公司 Method and equipment for triggering voice interaction response
CN111326152A (en) * 2018-12-17 2020-06-23 南京人工智能高等研究院有限公司 Voice control method and device
CN211481445U (en) * 2020-03-02 2020-09-11 科大讯飞股份有限公司 Voice acquisition intelligent earphone based on acoustic image coupling
CN112420033A (en) * 2019-08-23 2021-02-26 声音猎手公司 Vehicle-mounted device and method for processing words

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11682416B2 (en) * 2018-08-03 2023-06-20 International Business Machines Corporation Voice interactions in noisy environments

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107799125A (en) * 2017-11-09 2018-03-13 维沃移动通信有限公司 A kind of audio recognition method, mobile terminal and computer-readable recording medium
CN111326152A (en) * 2018-12-17 2020-06-23 南京人工智能高等研究院有限公司 Voice control method and device
CN112420033A (en) * 2019-08-23 2021-02-26 声音猎手公司 Vehicle-mounted device and method for processing words
CN111028842A (en) * 2019-12-10 2020-04-17 上海芯翌智能科技有限公司 Method and equipment for triggering voice interaction response
CN211481445U (en) * 2020-03-02 2020-09-11 科大讯飞股份有限公司 Voice acquisition intelligent earphone based on acoustic image coupling

Also Published As

Publication number Publication date
CN113689858A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
KR102339594B1 (en) Object recognition method, computer device, and computer-readable storage medium
US10762899B2 (en) Speech recognition method and apparatus based on speaker recognition
JP6596376B2 (en) Speaker identification method and speaker identification apparatus
US9633652B2 (en) Methods, systems, and circuits for speaker dependent voice recognition with a single lexicon
US9196247B2 (en) Voice recognition method and voice recognition apparatus
TWI466101B (en) Method and system for speech recognition
WO2016150001A1 (en) Speech recognition method, device and computer storage medium
US20120130716A1 (en) Speech recognition method for robot
US11727939B2 (en) Voice-controlled management of user profiles
CN110211599B (en) Application awakening method and device, storage medium and electronic equipment
US9530417B2 (en) Methods, systems, and circuits for text independent speaker recognition with automatic learning features
US9595261B2 (en) Pattern recognition device, pattern recognition method, and computer program product
CN108932944B (en) Decoding method and device
US9691389B2 (en) Spoken word generation method and system for speech recognition and computer readable medium thereof
US20180033427A1 (en) Speech recognition transformation system
CN106558306A (en) Method for voice recognition, device and equipment
CN111261195A (en) Audio testing method and device, storage medium and electronic equipment
US10861447B2 (en) Device for recognizing speeches and method for speech recognition
US11437022B2 (en) Performing speaker change detection and speaker recognition on a trigger phrase
US11081115B2 (en) Speaker recognition
CN109065026B (en) Recording control method and device
CN110853669A (en) Audio identification method, device and equipment
CN113689858B (en) Control method and device of cooking equipment, electronic equipment and storage medium
CN109741761B (en) Sound processing method and device
CN111369992A (en) Instruction execution method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant