CN110853619B - Man-machine interaction method, control device, controlled device and storage medium - Google Patents

Man-machine interaction method, control device, controlled device and storage medium Download PDF

Info

Publication number
CN110853619B
CN110853619B CN201810955004.3A CN201810955004A CN110853619B CN 110853619 B CN110853619 B CN 110853619B CN 201810955004 A CN201810955004 A CN 201810955004A CN 110853619 B CN110853619 B CN 110853619B
Authority
CN
China
Prior art keywords
voice signal
control
voice
user
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810955004.3A
Other languages
Chinese (zh)
Other versions
CN110853619A (en
Inventor
郭涛
杨春阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pateo Network Technology Service Co Ltd
Original Assignee
Shanghai Pateo Network Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pateo Network Technology Service Co Ltd filed Critical Shanghai Pateo Network Technology Service Co Ltd
Priority to CN201810955004.3A priority Critical patent/CN110853619B/en
Publication of CN110853619A publication Critical patent/CN110853619A/en
Application granted granted Critical
Publication of CN110853619B publication Critical patent/CN110853619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Automation & Control Theory (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Selective Calling Equipment (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention belongs to the technical field of intelligent control, and relates to a human-computer interaction method, a control device, a controlled device and a storage medium, wherein the human-computer interaction method comprises the following steps: a voice signal is received. Characteristics of a source of the voice signal including a facial orientation of a user originating the voice signal or a relative orientation of the user and the controlled device are detected. And judging whether the voice signal comprises a wakeup word. If the voice signal comprises the awakening word, voice instruction recognition is carried out on the voice signal to obtain the voice instruction. If the voice signal does not include the awakening word, when the characteristics of the voice signal source accord with preset characteristics, performing voice instruction recognition on the voice signal to acquire a voice instruction, wherein the preset characteristics comprise that the face of the user faces the front face of the controlled equipment/control device or the user is positioned on the front face of the controlled equipment. Therefore, the invention can effectively avoid the condition of false triggering of the controlled equipment, thereby improving the accuracy of the man-machine interaction method.

Description

Man-machine interaction method, control device, controlled device and storage medium
Technical Field
The invention belongs to the technical field of intelligent control, and particularly relates to a man-machine interaction method, a control device, a controlled device and a storage medium.
Background
With the popularization of intelligent terminals and the appearance of more and more intelligent devices and intelligent homes, human-computer interaction is a very core function. With the development of voice recognition technology, more and more intelligent devices adopt voice control to realize human-computer interaction, and when an existing voice terminal detects a voice control instruction, the existing voice terminal can respond to a control code corresponding to the detected voice control instruction based on a mapping relation between a pre-stored voice control instruction and the control code, so that the voice terminal belongs to a voice assistant function in human-computer interaction. At present, most of intelligent terminals have a voice assistant function, and generally need to input a specific voice (e.g., a wakeup word) to complete triggering, so that the voice assistant is in a voice to-be-input state. For example, in the case of power connection, the intelligent terminal with the voice assistant function says "small AI classmates" at a glance, and then the voice assistant service can be awakened.
However, the current voice assistant is only accurate in voice control triggering when there is a wake-up word, and cannot well resolve how to distinguish a voice receiving object in a natural language mode without the wake-up word, so that a false trigger control instruction is easily generated, for example, when a user says "watch tv", there may be two situations, one is that the user really wants to turn on a home tv, and the other is that the user chats with other people with the word "watch tv", and when the actual situation belongs to the second situation, the voice assistant is easy to generate a situation of turning on the tv by mistake.
In view of the above problems, those skilled in the art have sought solutions.
Disclosure of Invention
In view of this, the present invention provides a human-computer interaction method, a control device, a controlled device and a storage medium, and aims to improve the accuracy of human-computer interaction.
The invention is realized by the following steps:
the invention provides a man-machine interaction method, which comprises the following steps: a speech signal is received. Characteristics of a voice signal source are detected, wherein the characteristics of the voice signal source comprise the face orientation of a user sending the voice signal or the relative orientation of the user and a controlled device. And judging whether the voice signal comprises a wake-up word. And if the voice signal comprises the awakening word, performing voice instruction recognition on the voice signal to acquire the voice instruction. If the voice signal does not include the awakening word, when the characteristics of the voice signal source accord with preset characteristics, performing voice instruction recognition on the voice signal to acquire a voice instruction, wherein the preset characteristics comprise that the face of the user faces the front face of the controlled equipment/control device or the user is positioned on the front face of the controlled equipment.
Further, the step of detecting the characteristics of the voice signal source includes: the time when the user's eye is focused on the controlled device/controlling means is detected. The preset characteristic further comprises that the time for which the user's eyeball is focused on the controlled device/control apparatus is greater than a threshold value.
Further, before the step of determining whether the voice signal includes a wakeup word, the method includes: the method comprises the steps of obtaining the face of a user, and judging whether the face of the user is matched with a specific face stored in advance. And when the face of the user is matched with the specific face stored in advance, the step of judging whether the voice signal comprises the awakening word is carried out. And returning to the step of receiving the voice signal when the face does not match the specific face stored in advance.
Further, after the step of performing voice command recognition on the voice signal to obtain the voice command, the method includes: and entering a man-machine conversation mode according to the voice command, outputting corresponding conversation voice and/or performing corresponding control according to the voice command.
Further, the preset feature includes that the face of the user faces the front of the control device. After the step of performing voice command recognition on the voice signal to obtain the voice command, the method comprises the following steps: and judging whether the voice instruction comprises a control object, wherein the control object comprises at least one household device. And if the voice command does not comprise a control object, acquiring the household appliance control big data according to the current environment information and/or the current time information, and acquiring at least one household appliance and household appliance control information corresponding to each household appliance according to the household appliance control big data so as to control the corresponding household appliances according to the household appliance control information respectively. And if the voice command comprises the control object, correspondingly controlling the control object according to the voice command.
Further, if the voice command includes a control object, the step of correspondingly controlling the control object includes: the method comprises the steps of detecting a face of a user, acquiring historical control information of a control object corresponding to the face, and correspondingly controlling the control object according to the historical control information, wherein the control object comprises a television and/or a music player and/or an electric lamp.
The present invention also provides a control apparatus comprising: and the voice signal receiving module is used for receiving the voice signal. And the characteristic detection module is connected with the voice signal receiving module and is used for detecting the characteristics of the voice signal source, wherein the characteristics of the voice signal source comprise the face orientation of a user sending the voice signal or the relative direction of the user and the controlled device. And the awakening word recognition module is connected with the voice signal receiving module and is used for judging whether the voice signal comprises the awakening word. And the voice instruction acquisition module is used for carrying out voice instruction identification on the voice signal to acquire the voice instruction when the voice signal comprises the awakening word, and carrying out voice instruction identification on the voice signal to acquire the voice instruction when the characteristics of the voice signal source accord with the preset characteristics when the voice signal does not comprise the awakening word, wherein the preset characteristics comprise that the face of the user faces the front face of the controlled equipment/control device or the user is positioned on the front face of the controlled equipment.
The invention also provides a controlled device, and the controlled device comprises the control device.
The invention also provides a control device comprising a processor for executing a computer program stored in a memory to implement the steps of the human-computer interaction method described above.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, implements the steps of the above-described human-computer interaction method.
In the invention, after receiving the voice signal, the characteristics of the voice signal source are detected, and the characteristics of the voice signal source comprise the face orientation of a user sending the voice signal or the relative direction of the user and a controlled device. And judging whether the voice signal comprises a wake-up word. And if the voice signal comprises the awakening word, performing voice instruction recognition on the voice signal to acquire the voice instruction. If the voice signal does not include the awakening word, when the characteristics of the voice signal source accord with the preset characteristics, the step of performing voice instruction recognition on the voice signal to acquire the voice instruction is carried out, wherein the preset characteristics comprise that the face of the user faces the front face of the controlled equipment/control device or the user is positioned on the front face of the controlled equipment.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
Fig. 1 is a schematic flowchart of a human-computer interaction method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a human-computer interaction method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a control device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a controlled device according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a control device according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment is as follows:
fig. 1 is a schematic flowchart of a human-computer interaction method according to an embodiment of the present invention. For a clear description of the man-machine interaction method provided by the first embodiment of the present invention, please refer to fig. 1.
The man-machine interaction method provided by the embodiment of the invention comprises the following steps:
and S101, receiving a voice signal.
In an embodiment, the device/apparatus applying the man-machine interaction method provided in this embodiment is in a mute detection state before receiving a voice signal, where power consumption of the device/apparatus is extremely low, so that the device/apparatus can maintain the capability of long-time operation.
In one embodiment, in step S101, the method may further include: when the volume of the received voice signal reaches a certain threshold, the process proceeds to step S102.
And S102, detecting the characteristics of the voice signal source.
Specifically, the voice signal source includes, but is not limited to, a user who utters a voice signal. The characteristics of the source of the voice signal may include the face orientation of the user originating the voice signal or the relative orientation of the user originating the voice signal and the controlled device.
In an embodiment, the detection of the face orientation of the user who sends out the voice signal may be performed by the control apparatus or the controlled device through an image capturing apparatus, wherein the image capturing apparatus may be, but is not limited to being, integrated in the control apparatus or the controlled device.
In one embodiment, the detection of the relative position of the user sending the voice signal and the controlled device may be performed by the control device or the controlled device through an image acquisition device and/or a sound source positioning device, wherein the image acquisition device and/or the sound source positioning device may be integrated in the controlled device or the control device.
In one embodiment, the control device may perform unified control on a plurality of controlled devices, where the controlled devices may be, for example, electronic curtains, televisions, electronic doors, air conditioners, electric lamps, and the like. In other embodiments, the control device may control only one controlled device, and the control device may be integrated in the controlled device.
S103, judging whether the voice signal comprises a wake-up word.
In one embodiment, the wake-up word refers to a specific vocabulary for waking up the controlling device or the controlled device. The wake word may be the name of the device or the name of a voice recognition program in the device, such as "Temple genius," "Xiaoai classmate," "Voice Assistant," or the like.
In other embodiments, step S103 is performed: before judging whether the voice signal comprises the awakening word, the method can comprise the following steps: the method comprises the steps of obtaining the face of a user, and judging whether the face of the user is matched with a specific face stored in advance. When the face of the user matches with the specific face stored in advance, the step S103 is entered, or when the face of the user does not match with the specific face stored in advance, the step S101 is returned to. The pre-stored specific faces may be obtained and stored by the controlled device or the control apparatus by performing image acquisition in advance through the image acquirer, and when a plurality of specific faces are stored, one or more names (e.g., a person name, a relationship name, etc.) may be set for each specific face to be associated with the specific face.
And S104, if the voice signal comprises the awakening word, performing language instruction identification on the voice signal to acquire the voice instruction.
In one embodiment, after the step of performing voice command recognition on the voice signal, the method includes: and entering a man-machine conversation mode according to the voice command, outputting corresponding conversation voice and/or performing corresponding control according to the voice command. For example, the voice command is to turn on a television, the control device controls the television to turn on or the television is turned on automatically, and at this time, the control device or the television can ask "what program do you want to watch? "and allows the television set to jump to the program after the user speaks the desired program.
In other embodiments, after the user 'S face is acquired and when the user' S face matches a specific face stored in advance, the steps S103 and S104 are entered, and then a human-computer conversation mode is entered according to a voice instruction. For example, when the acquired face is matched with a pre-stored specific face, the control device or the controlled device makes a voice call, so that the user has more intimate experience, and further, personalized man-machine conversation can be performed according to the title corresponding to the specific face, so that different man-machine interaction experiences can be brought to different users.
In one embodiment, the voice command is obtained according to the voice signal, and meanwhile, corresponding analysis operation is performed according to processing of multiple modes such as voice recognition, semantic understanding, image detection and recognition and the like of the voice signal, and a learning model is established, so that a more intelligent and personalized man-machine conversation mode can be realized, and user experience is improved. For example, after the user moves, the user sends a voice signal to say "good heat" to help me to turn on the air conditioner ", at this time, voice recognition, semantic understanding and image information of the user are performed on the voice signal, and after processing, it can be obtained that the user feels very hot after moving instead of being hot, so that the device/apparatus can send a voice prompt to the user that" you have moved, and then advise to rest for a while and then turn on the air conditioner. "
In one embodiment, when the voice signal only includes the wake-up word, the user may be actively prompted to send a voice command, and further, when the voice signal only includes the wake-up word, and the voice command is not detected within a preset time period, a voice prompt may be set to be given to the user (e.g., "little brother, what bar to say").
And S105, if the voice signal comprises the awakening word, judging whether the characteristics of the voice signal source accord with the preset characteristics or not, if so, executing the step S104, and if not, returning to the step S101.
Specifically, the preset feature includes that the face of the user faces the controlled device/control device, or the user is positioned on the front face of the controlled device.
In one embodiment, the step of detecting the characteristics of the voice signal source comprises: after detecting the face orientation of the user, when the eye features of the user can be detected, the time when the user's eyes are focused on the controlled device/control apparatus is detected. The preset feature in step S105 may further include that a time for which the user' S eyeball is focused on the controlled device/control apparatus is greater than a threshold value. Therefore, when the voice signal does not include the wake-up word and the feature of the voice signal source conforms to the preset feature, step S104 is executed, for example, when it is detected that the face of the user faces the controlled device/control apparatus and it is detected that the time when the eyeball of the user focuses on the controlled device/control apparatus is greater than the threshold, the voice command is identified for the voice signal to obtain the voice command, so that when the voice signal does not include the wake-up word, it can be determined whether the voice signal sent by the user is identified for obtaining the voice command by identifying the face orientation and the focusing state of the eyeball of the user, thereby effectively avoiding the occurrence of the situation that the user mistakenly triggers the controlled device/control apparatus when speaking in natural language (e.g. chatting), and greatly improving the accuracy of the man-machine interaction method.
In other embodiments, when the face of the user faces the front of the control apparatus, the control apparatus can determine whether a control object is included in the voice instruction (where the voice instruction is acquired according to the voice signal), where the control object may include at least one controlled device (i.e., the control apparatus may control the at least one controlled device). Further, when the voice command does not include the control object, the control device may automatically control the corresponding controlled device according to the control big data. Furthermore, when the voice signal comprises the control object, the control device can automatically and correspondingly control the control object according to the historical control information of the control object, so that the human-family interaction method provided by the invention is more intelligent.
In one embodiment, the step of detecting the characteristics of the voice signal source comprises: the relative orientation of the user emitting the voice signal and the controlled device is detected. Therefore, when the voice signal does not include the awakening word, and the relative orientation of the user and the controlled device accords with the preset characteristics, the step of acquiring the voice instruction is carried out. Wherein the relative orientation of the user and the controlled device conforms to a predetermined characteristic, for example, the user is located on the front of the controlled device (e.g., the user is located on the front of a television), the distance between the user and the controlled device is less than a threshold (e.g., the distance between the user and a fan is less than 5 meters), and the like.
The man-machine interaction method provided by the embodiment of the invention detects the characteristics of the voice signal source after receiving the voice message. And judging whether the voice signal comprises a wake-up word. And if the voice signal comprises the awakening word, performing voice instruction recognition on the voice signal to acquire the voice instruction. If the voice signal does not include the awakening word, when the characteristics of the voice signal source accord with the preset characteristics, the step of performing voice instruction recognition on the voice signal source is carried out to obtain the voice instruction, wherein the preset characteristics comprise that the face of the user faces the front face of the controlled device/control device or the user is positioned on the front face of the controlled device.
Example two:
fig. 2 is a flowchart illustrating a human-computer interaction method according to a second embodiment of the present invention. For a clear description of the man-machine interaction method provided by the second embodiment of the present invention, please refer to fig. 2.
The man-machine interaction method provided by the embodiment two of the invention is applied to a control device and comprises the following steps:
s201, receiving a voice signal.
And S202, detecting the characteristics of the voice signal source.
In particular, the characteristics of the voice signal source may include the face orientation of the user originating the voice signal or the relative orientation of the user originating the voice signal and the controlled device, wherein the voice signal source includes, but is not limited to, the user originating the voice signal. Specifically, after the voice signal is received, the characteristics of the voice signal source are detected immediately, so that the characteristics when the user sends the voice signal can be detected in time, the condition that the characteristics of the voice signal source are inaccurate due to characteristic transformation caused by other actions after the user sends the voice signal is prevented, and the accuracy of subsequent steps can be further ensured.
In an embodiment, the detecting of the face orientation of the user who sends the voice signal may be performed by the control device or the controlled device through an image capturing device (e.g., a camera), for example, when the control device receives the voice signal, the image capturing device of the control device is turned on to detect the face orientation of the user. For another example, when the control device receives the voice signal, the controlled device may be controlled to turn on the image capturing device to detect the face orientation of the user.
In an embodiment, the control device or the controlled device may include an image capturing device, and may also be an image capturing device external to the control device or the controlled device.
In one embodiment, the control apparatus may perform unified management on a plurality of controlled devices, such as electronic curtains, televisions, electronic doors, air conditioners, electric lights, and the like.
S203, judging whether the voice signal comprises a wake-up word.
And S204, if the voice signal comprises the awakening word, performing language instruction identification on the voice signal to acquire the voice instruction.
S205, if the voice signal does not include the awakening word, determining whether the characteristic of the voice signal source accords with the preset characteristic, if so, executing the step S204, and if not, returning to the step S201.
Specifically, the preset feature includes that the face of the user faces the controlled device/control device, or that the user is located on the front face of the controlled device.
In an embodiment, the preset feature may be that the face of the user faces the front of the control device, and when the face of the user faces the front of the control device, step S206 is executed.
S206: and judging whether the voice instruction comprises a control object.
Specifically, step S206 is after the step of performing voice instruction recognition on the voice signal to acquire the voice instruction.
In one embodiment, the control object includes at least one controlled device (e.g., a television, a smart speaker, an air conditioner, a washing machine, an electric lamp, an electronic curtain, an electronic door, a floor sweeping robot, or other devices).
And S207, if the voice command does not comprise a control object, acquiring the household appliance control big data according to the current environment information and/or the current time information, and acquiring at least one household device and household appliance control information corresponding to each household device according to the household appliance control big data so as to control the corresponding household devices according to the household appliance control information respectively.
In one embodiment, the voice command does not include a control object, such as a user's fuzzy voice command (e.g., "open", etc.) against the control device. Specifically, after acquiring the fuzzy voice command, the control device may acquire at least one home appliance according to the current environment information and/or the current time information, and perform corresponding control.
In one embodiment, the current environment information includes at least one of indoor temperature information, indoor brightness information, floor cleanliness information, and the number of people in the room, but the current environment information is not limited to include the indoor temperature information, the indoor brightness information, the floor cleanliness information, the number of people in the room, and the like.
In an embodiment, the control device may obtain, from the cloud server, the household appliance control big data corresponding to the current environment information and/or the current time information according to the current environment information and/or the current time information, where the household appliance control big data stored in the cloud server may be household appliance control data corresponding to the environment information and/or the time information, which is uploaded to the cloud server by the user through the control device, or household appliance control data corresponding to the environment information and/or the time information, which is uploaded to the cloud server by other users through other control terminals. Specifically, the control device obtains the household appliance control big data corresponding to the current environment information and/or the current time information from the cloud server according to the current environment information and/or the current time information, where the household appliance control big data is household appliance control big data of the user stored in the cloud or household appliance control big data commonly used by other users stored in the cloud.
Specifically, the control device may obtain the household appliance control big data corresponding to the current environment information according to the current environment information, for example, the illumination intensity is less than 50lux (i.e., the light is dark), the indoor temperature is higher than 35 °, or the floor has rubbish, and then the control device obtains the household appliance control big data of the user from the cloud server according to the current environment information to turn on the electric lamp or turn on the curtain, turn on the air conditioner, set the temperature of the air conditioner, or turn on the sweeping robot.
In addition, the control device can also obtain the household appliance control big data corresponding to the current time information according to the current time information, for example, the household appliance control big data corresponding to each time point of the user in the time period from 5 to 9 am is to open a curtain, open a music player or a water dispenser to start heating, and the like; the household appliance control big data of the user corresponding to each time point from 6 pm to 8 pm is to turn on a television or a computer or turn on a lamp and the like.
In addition, the control device may further obtain the household appliance control big data corresponding to the current environment information and the current time information according to the current environment information and the current time information, for example, if the indoor temperature is higher than 35 ° and the time is in a half period from 7 pm to 7 pm, the control device may turn on the air conditioner and set the air conditioner temperature, turn on the electric lamp, or turn on the television broadcast program (for example, a news simulcast at a central television station at 7 pm) according to the household appliance control big data of the user obtained by the control device according to the current environment information and the current time information.
In another embodiment, the control device obtains the home appliance control big data of the user corresponding to the current environment information and/or the current time information, which is stored in the control device, according to the current environment information and/or the current time information.
In one embodiment, the obtained household appliance big data includes at least one household appliance to be controlled and household appliance control information corresponding to each household appliance to be controlled. Therefore, when the voice instruction is obtained and the voice instruction does not include a control object, the man-machine interaction method provided by the embodiment can intelligently select at least one piece of household electrical appliance equipment to perform corresponding control according to the current environmental information and/or the current time, so that the man-machine interaction method provided by the embodiment not only greatly improves the accuracy of human interaction, but also enables the man-machine interaction method provided by the embodiment to be more intelligent.
S208: and if the voice command comprises the control object, correspondingly controlling the control object according to the voice command.
In one embodiment, when only the control object is included in the voice command but the control information of the control object is included in the voice command, a face of the user is detected, and historical control information of the control object corresponding to the face is acquired, so that the control object is controlled correspondingly according to the historical control information. For example, when the voice command indicates that the television is turned on and there is no channel information or program information, the control device may acquire, according to the face of the user, program information and viewing progress information (i.e., history control information) that the user has watched last time, so that the control device may control the television to turn on the multimedia information corresponding to the program information and the viewing progress information after turning on the television.
In an embodiment, the step of detecting the face of the user may be detected at the same time of detecting the feature of the voice signal source at step S202.
The man-machine interaction method provided by the second embodiment of the invention comprises the following steps: a voice signal is received. Characteristics of a source of the speech signal are detected. And judging whether the voice signal comprises a wakeup word. If the voice signal comprises the awakening word, performing language instruction identification on the voice signal to acquire the voice instruction. And if the voice signal does not comprise the awakening word, judging whether the characteristics of the voice signal source accord with the preset characteristics or not, if so, acquiring a voice instruction, and if not, returning to the step of receiving the voice signal. After the step of acquiring the voice command, it is determined whether the voice command includes a control object. And if the voice command does not comprise a control object, acquiring the household appliance control big data according to the current environment information and/or the current time information, and acquiring at least one household appliance and household appliance control information corresponding to each household appliance according to the household appliance control big data so as to control the corresponding household appliances according to the household appliance control information respectively. And if the voice command does not comprise the control object, correspondingly controlling the control object according to the voice command. Therefore, when the voice signal does not include the wake-up word, the feature of the user sending the voice signal is detected, and when the feature of the user is detected to conform to the preset feature, the step of obtaining the voice command is performed, so that the purpose of improving the accuracy of the human-computer interaction can be achieved, and after the step of obtaining the voice command, by judging whether the voice command includes the control object or not, when the voice command does not include the control object, at least one control object can be obtained according to the current environmental information and/or the current time information, and each control object is correspondingly controlled, so that the intelligence of the human-computer interaction method provided by the embodiment is greatly improved.
Example three:
fig. 3 is a schematic structural diagram of a control device according to a third embodiment of the present invention. For a clear description of the control device 1 according to the third embodiment of the present invention, please refer to fig. 3.
Referring to fig. 3, a control device 1 according to a third embodiment of the present invention includes: the system comprises a voice signal receiving module 101, a feature detection module 102, a wake-up recognition module 103 and a voice command acquisition module 104.
Specifically, the voice signal receiving module 101 is configured to receive a voice signal.
In an embodiment, before the voice signal receiving module 101 receives the voice signal, the voice signal receiving module 101 is in a mute detection state, and the power consumption of the control device 1 is very low, so that the control device 1 maintains the capability of long-time operation.
Specifically, the feature detection module 102 is connected to the voice signal receiving module 101 and configured to detect a feature of a voice signal source, where the feature of the voice signal source includes a face orientation of a user who sends a voice signal or a relative orientation of the user and a controlled device.
In one embodiment, the feature detection module 102 includes an image acquisition device. In other embodiments, the feature detection module 102 may include an image acquisition device and/or a sound source localization device. The image acquisition device can be used for acquiring the image information of the voice signal source, so that the characteristics of the voice signal source are identified. The sound source positioning device can judge the direction of the voice signal source according to the received voice signal.
Specifically, the wake-up recognition module 103 is connected to the voice signal receiving module 101, and is configured to determine whether the voice signal includes a wake-up word.
Specifically, the voice instruction obtaining module 104 is configured to perform voice instruction recognition on the voice signal when the voice signal includes a wake-up word to obtain a voice instruction, and perform voice instruction recognition on the voice signal when the characteristic of the voice signal source meets a preset characteristic when the voice signal does not include the wake-up word to obtain the voice instruction, where the preset characteristic includes that the face of the user faces the front of the controlled device/control apparatus 1, or that the user is located on the front of the controlled device.
In an embodiment, the voice command recognition module is configured to determine whether the voice command includes a control object. And if the voice command does not comprise a control object, acquiring the household appliance control big data according to the current environment information and/or the current time information, and acquiring at least one household appliance and household appliance control information corresponding to each household appliance according to the household appliance control big data so as to control the corresponding household appliances according to the household appliance control information respectively. And if the voice command comprises the control object, correspondingly controlling the control object according to the voice command.
In one embodiment, when the voice command only includes the control object but does not include the control information of the control object, the voice command recognition module detects a face of the user and acquires historical control information of the control object corresponding to the face, so as to perform corresponding control on the control object according to the historical control information.
In the control apparatus 1 provided by the third embodiment of the present invention, the voice signal receiving module 101 is configured to receive a voice signal. The feature detection module 102 is connected to the voice signal receiving module 101 and is configured to detect a feature of a voice signal source, where the feature of the voice signal source includes a face orientation of a user who sends a voice signal or a relative orientation of the user and a controlled device. The wake-up recognition module 103 is connected to the voice signal receiving module 101, and is configured to determine whether the voice signal includes a wake-up word. And the voice instruction acquisition module 104 is configured to perform voice instruction recognition on the voice signal to acquire a voice instruction when the voice signal includes the wake-up word, and perform voice instruction recognition on the voice signal to acquire the voice instruction when the feature of the voice signal source conforms to a preset feature when the voice signal does not include the wake-up word, where the preset feature includes that the face of the user faces the front of the controlled device/control apparatus 1, or that the user is located on the front of the controlled device. Therefore, with the control device 1 provided in the embodiment of the present invention, in the process of human-computer interaction, when the received voice signal does not include the wakeup word, it can be determined through the face orientation of the user or the relative orientation between the user and the controlled device whether the voice signal sent by the user needs to be subjected to voice instruction recognition, and then the controlled device is correspondingly controlled according to the voice instruction, so that the occurrence of a situation that the controlled device/or the control device 1 is erroneously triggered when the user speaks in a natural language (e.g., chatting) can be effectively avoided, and the accuracy of the control device 1 during human-computer interaction can be greatly improved.
Example four:
fig. 4 is a schematic structural diagram of a controlled device according to a fourth embodiment of the present invention. For clearly describing the controlled device 2 provided in the fourth embodiment of the present invention, please refer to fig. 4.
Referring to fig. 4, specifically, the controlled device 2 includes a control apparatus provided by the present invention (for example, the control apparatus 1 provided by the third embodiment of the present invention). Specifically, the control device 1 can implement the human-computer interaction method provided by the present invention (for example, the human-computer interaction method provided by the first embodiment and/or the human-computer interaction method provided by the second embodiment).
Therefore, the controlled device 2 provided in this embodiment can determine, through the face orientation of the user or the relative orientation between the user and the controlled device 2, whether to perform voice instruction recognition on the voice signal sent by the user when the received voice signal does not include the wakeup word in the human-computer interaction process, and then perform corresponding control on the controlled device 2 according to the voice instruction, so that the controlled device 2 provided in this embodiment can effectively avoid the occurrence of a situation that the controlled device 2 is falsely triggered when the user speaks in natural language (for example, chat), and thus can greatly improve the accuracy of the controlled device 2 during human-computer interaction.
Example five:
fig. 5 is a schematic structural diagram of a control device according to a fifth embodiment of the present invention. For clearly describing the control device provided in the fifth embodiment of the present invention, please refer to fig. 5.
The control device provided by the fifth embodiment of the present invention includes a processor a101, where the processor a101 is configured to execute the computer program A6 stored in the memory a201 to implement the steps of the human-computer interaction method described in the first embodiment or the second embodiment.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method, apparatus, or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining software and hardware may be referred to herein as a "circuit," module, "" system.
In an embodiment, the control device provided in this embodiment may include at least one processor a101 and at least one memory a201. Among them, at least one processor a101 may be referred to as a processing unit A1, and at least one memory a201 may be referred to as a storage unit A2. Specifically, the storage unit A2 stores a computer program A6, and when the computer program A6 is executed by the processing unit A1, the control apparatus provided by the present embodiment is caused to implement the steps of the human-computer interaction method as described above, such as the step S206 shown in fig. 2: and judging whether the voice instruction comprises a control object. As another example, step S105 shown in fig. 1: and judging whether the characteristics of the voice signal source accord with preset characteristics or not.
In one embodiment, the control device further comprises a bus connecting the different components (e.g. processor a101 and memory a 201). In one embodiment, the bus may represent one or more of several types of bus structures, including a memory bus or memory controller bus, a peripheral bus, and the like.
Referring to fig. 5, in an embodiment, the control device provided in the present embodiment includes a plurality of memories a201 (referred to as a storage unit A2 for short), and the storage unit A2 may include, for example, a Random Access Memory (RAM) and/or a cache memory and/or a Read Only Memory (ROM), and the like.
Referring to fig. 5, in an embodiment, the control device in this embodiment may further include a communication interface (e.g., an I/O interface A4), and the communication interface may be used to communicate with an external device (e.g., a computer, a smart terminal, etc.).
Referring to fig. 5, in an embodiment, the control device in this embodiment may further include a display device and/or an input device (e.g., the illustrated touch display screen A3).
Referring to fig. 5, in an embodiment, the control device provided in this embodiment may further include a network adapter A5, where the network adapter A5 may be used to communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, etc.). As shown in fig. 5, the network adapter A5 may communicate with other components of the control apparatus through wires.
The control device provided in this embodiment can implement the steps of the human-computer interaction method provided in the present invention, and for specific implementation and beneficial effects, reference may be made to the first embodiment and the second embodiment of the present invention, which will not be described herein again.
Example five:
in an embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, is capable of implementing the steps of the human-computer interaction method of, for example, the first embodiment or the second embodiment. Alternatively, the computer program can realize the functions of the controller and the controlled device when executed by the processor.
In this embodiment, when being executed by a processor, a computer program in a computer-readable storage medium implements the steps of the human-computer interaction method or the functions of a controller or a controlled device, which will not be described herein again, and specific embodiments and beneficial effects may refer to embodiments one to four of the present invention.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A human-computer interaction method is characterized by comprising the following steps:
receiving a voice signal;
detecting characteristics of a voice signal source, wherein the characteristics of the voice signal source comprise the face orientation of a user who sends the voice signal or the relative orientation of the user and a controlled device/control device; judging whether the voice signal comprises a wake-up word or not;
if the voice signal comprises the awakening word, performing voice instruction recognition on the voice signal to acquire a voice instruction;
if the voice signal does not include the awakening word, performing voice instruction recognition on the voice signal to acquire a voice instruction when the characteristics of the voice signal source accord with preset characteristics, wherein the preset characteristics comprise that the face of the user faces the front face of the controlled equipment/control device or the user is positioned on the front face of the controlled equipment/control device;
judging whether the voice instruction comprises a control object or not;
if the voice instruction comprises a control object and does not comprise control information of the control object, detecting the face of a user, and acquiring historical control information of the control object corresponding to the face so as to correspondingly control the control object according to the historical control information:
if the voice command does not comprise the control object, acquiring household appliance control big data according to current environment information and/or current time information, and acquiring at least one household appliance and household appliance control information corresponding to each household appliance according to the household appliance control big data so as to control the corresponding household appliance according to the household appliance control information respectively.
2. The human-computer interaction method of claim 1, wherein the step of detecting the characteristics of the source of the voice signal comprises:
detecting when the user's eye is focused on the controlled device/the controlling apparatus;
the preset characteristic further comprises that the time when the eyeball of the user focuses on the controlled device/the control device is larger than a threshold value.
3. The human-computer interaction method of claim 1, wherein the step of determining whether the voice signal includes a wake-up word is preceded by:
acquiring the face of the user, and judging whether the face of the user is matched with a specific face stored in advance;
when the face of the user is matched with the specific face stored in advance, the step of judging whether the voice signal comprises the awakening word is carried out;
and returning to the step of receiving the voice signal when the face does not match the specific face stored in advance.
4. The human-computer interaction method of claim 1, wherein after the step of performing voice command recognition on the voice signal to obtain a voice command, the method comprises:
and entering a man-machine conversation mode according to the voice command, outputting corresponding conversation voice and/or performing corresponding control according to the voice command.
5. Human-computer interaction method according to claim 1,
the control isThe object comprises at least one household appliance
6. A human-computer interaction method according to claim 1, wherein the control object comprises a television and/or a music player and/or a lamp.
7. A control device, comprising:
the voice signal receiving module is used for receiving a voice signal;
the characteristic detection module is connected with the voice signal receiving module and is used for detecting the characteristics of a voice signal source, wherein the characteristics of the voice signal source comprise the face orientation of a user sending the voice signal or the relative orientation of the user and a controlled device/a control device;
the awakening word recognition module is connected with the voice signal receiving module and used for judging whether the voice signals comprise awakening words or not;
a voice instruction obtaining module, configured to perform voice instruction recognition on the voice signal to obtain a voice instruction when the voice signal includes the wake-up word, and perform voice instruction recognition on the voice signal to obtain a voice instruction when the voice signal does not include the wake-up word and the feature of the voice signal source conforms to a preset feature, where the preset feature includes that a face of the user faces a front surface of the controlled device/control apparatus, or that the user is located on the front surface of the controlled device/control apparatus;
judging whether the voice instruction comprises a control object or not;
if the voice instruction comprises a control object and does not comprise control information of the control object, detecting a face of a user, and acquiring historical control information of the control object corresponding to the face so as to correspondingly control the control object according to the historical control information;
if the voice command does not comprise the control object, acquiring household appliance control big data according to current environment information and/or current time information, and acquiring at least one household appliance and household appliance control information corresponding to each household appliance according to the household appliance control big data so as to control the corresponding household appliance according to the household appliance control information respectively.
8. A controlled device, characterized in that the controlled device comprises the control apparatus according to claim 7.
9. A control apparatus, characterized in that the control apparatus comprises a processor for executing a computer program stored in a memory for implementing the steps of the human-computer interaction method according to any one of claims 1-5.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the human-computer interaction method according to any one of claims 1 to 5.
CN201810955004.3A 2018-08-21 2018-08-21 Man-machine interaction method, control device, controlled device and storage medium Active CN110853619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810955004.3A CN110853619B (en) 2018-08-21 2018-08-21 Man-machine interaction method, control device, controlled device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810955004.3A CN110853619B (en) 2018-08-21 2018-08-21 Man-machine interaction method, control device, controlled device and storage medium

Publications (2)

Publication Number Publication Date
CN110853619A CN110853619A (en) 2020-02-28
CN110853619B true CN110853619B (en) 2022-11-25

Family

ID=69594558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810955004.3A Active CN110853619B (en) 2018-08-21 2018-08-21 Man-machine interaction method, control device, controlled device and storage medium

Country Status (1)

Country Link
CN (1) CN110853619B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113359538A (en) * 2020-03-05 2021-09-07 东元电机股份有限公司 Voice control robot
CN111443801B (en) * 2020-03-25 2023-10-13 北京百度网讯科技有限公司 Man-machine interaction method, device, equipment and storage medium
CN111562346A (en) * 2020-05-06 2020-08-21 江苏美的清洁电器股份有限公司 Control method, device and equipment of dust collection station, dust collection station and storage medium
CN111739533A (en) * 2020-07-28 2020-10-02 睿住科技有限公司 Voice control system, method and device, storage medium and voice equipment
CN112102546A (en) * 2020-08-07 2020-12-18 浙江大华技术股份有限公司 Man-machine interaction control method, talkback calling method and related device
CN115086095A (en) * 2021-03-10 2022-09-20 Oppo广东移动通信有限公司 Equipment control method and related device
CN115083402A (en) * 2021-03-15 2022-09-20 Oppo广东移动通信有限公司 Method, device, terminal and storage medium for responding control voice
CN113470658A (en) * 2021-05-31 2021-10-01 翱捷科技(深圳)有限公司 Intelligent earphone and voice awakening threshold value adjusting method thereof
CN113470660A (en) * 2021-05-31 2021-10-01 翱捷科技(深圳)有限公司 Voice wake-up threshold adjusting method and system based on router flow
CN113470659A (en) * 2021-05-31 2021-10-01 翱捷科技(深圳)有限公司 Light intensity-based voice awakening threshold value adjusting method and device
CN113421567A (en) * 2021-08-25 2021-09-21 江西影创信息产业有限公司 Terminal equipment control method and system based on intelligent glasses and intelligent glasses
CN114253396A (en) * 2021-11-15 2022-03-29 青岛海尔空调电子有限公司 Target control method, device, equipment and medium
CN115588435A (en) * 2022-11-08 2023-01-10 荣耀终端有限公司 Voice wake-up method and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009244912A (en) * 2009-07-29 2009-10-22 Victor Co Of Japan Ltd Speech signal processing device and speech signal processing method
CN102945029A (en) * 2012-10-31 2013-02-27 鸿富锦精密工业(深圳)有限公司 Intelligent gateway, smart home system and intelligent control method for home appliance equipment
CN104238369A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Intelligent household appliance control method and device
CN105700363A (en) * 2016-01-19 2016-06-22 深圳创维-Rgb电子有限公司 Method and system for waking up smart home equipment voice control device
CN105703978A (en) * 2014-11-24 2016-06-22 武汉物联远科技有限公司 Smart home control system and method
CN105912092A (en) * 2016-04-06 2016-08-31 北京地平线机器人技术研发有限公司 Voice waking up method and voice recognition device in man-machine interaction
WO2016167004A1 (en) * 2015-04-14 2016-10-20 シャープ株式会社 Voice recognition system
CN107908116A (en) * 2017-10-20 2018-04-13 深圳市艾特智能科技有限公司 Sound control method, intelligent domestic system, storage medium and computer equipment
CN108320742A (en) * 2018-01-31 2018-07-24 广东美的制冷设备有限公司 Voice interactive method, smart machine and storage medium
CN108320753A (en) * 2018-01-22 2018-07-24 珠海格力电器股份有限公司 Control method, the device and system of electrical equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102392113B1 (en) * 2016-01-20 2022-04-29 삼성전자주식회사 Electronic device and method for processing voice command thereof

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009244912A (en) * 2009-07-29 2009-10-22 Victor Co Of Japan Ltd Speech signal processing device and speech signal processing method
CN102945029A (en) * 2012-10-31 2013-02-27 鸿富锦精密工业(深圳)有限公司 Intelligent gateway, smart home system and intelligent control method for home appliance equipment
CN104238369A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Intelligent household appliance control method and device
CN105703978A (en) * 2014-11-24 2016-06-22 武汉物联远科技有限公司 Smart home control system and method
WO2016167004A1 (en) * 2015-04-14 2016-10-20 シャープ株式会社 Voice recognition system
CN105700363A (en) * 2016-01-19 2016-06-22 深圳创维-Rgb电子有限公司 Method and system for waking up smart home equipment voice control device
CN105912092A (en) * 2016-04-06 2016-08-31 北京地平线机器人技术研发有限公司 Voice waking up method and voice recognition device in man-machine interaction
CN107908116A (en) * 2017-10-20 2018-04-13 深圳市艾特智能科技有限公司 Sound control method, intelligent domestic system, storage medium and computer equipment
CN108320753A (en) * 2018-01-22 2018-07-24 珠海格力电器股份有限公司 Control method, the device and system of electrical equipment
CN108320742A (en) * 2018-01-31 2018-07-24 广东美的制冷设备有限公司 Voice interactive method, smart machine and storage medium

Also Published As

Publication number Publication date
CN110853619A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
CN110853619B (en) Man-machine interaction method, control device, controlled device and storage medium
CN111989741B (en) Speech-based user interface with dynamically switchable endpoints
CN107370649B (en) Household appliance control method, system, control terminal and storage medium
CN105471705B (en) Intelligent control method, equipment and system based on instant messaging
JP6516585B2 (en) Control device, method thereof and program
CN105700389B (en) Intelligent home natural language control method
CN105118257B (en) Intelligent control system and method
CN109240111A (en) Intelligent home furnishing control method, device, system and intelligent gateway
CN106440192A (en) Household appliance control method, device and system and intelligent air conditioner
JP2005284492A (en) Operating device using voice
WO2017141530A1 (en) Information processing device, information processing method and program
TWI521385B (en) And a control system and a method for driving the corresponding device according to the triggering strategy
CN111754997B (en) Control device and operation method thereof, and voice interaction device and operation method thereof
CN111817936A (en) Control method and device of intelligent household equipment, electronic equipment and storage medium
JP7262532B2 (en) VOICE INTERACTIVE PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE, STORAGE MEDIUM AND PROGRAM
CN114067798A (en) Server, intelligent equipment and intelligent voice control method
CN113593544A (en) Device control method and apparatus, storage medium, and electronic apparatus
CN112838967B (en) Main control equipment, intelligent home and control device, control system and control method thereof
CN110632854A (en) Voice control method and device, voice control node and system and storage medium
CN116582382B (en) Intelligent device control method and device, storage medium and electronic device
US10564614B2 (en) Progressive profiling in an automation system
CN109976169B (en) Internet television intelligent control method and system based on self-learning technology
US11818820B2 (en) Adapting a lighting control interface based on an analysis of conversational input
CN113641105A (en) Household appliance control method, device, equipment and storage medium
CN114815635A (en) Computer readable storage medium, intelligent panel and voice interaction method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant