CN108320742B

CN108320742B - Voice interaction method, intelligent device and storage medium

Info

Publication number: CN108320742B
Application number: CN201810100378.7A
Authority: CN
Inventors: 梁文华
Original assignee: Midea Group Co Ltd; GD Midea Air Conditioning Equipment Co Ltd
Current assignee: Midea Group Co Ltd; GD Midea Air Conditioning Equipment Co Ltd
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2021-09-14
Anticipated expiration: 2038-01-31
Also published as: CN108320742A

Abstract

The invention provides a voice interaction method, which is applied to intelligent equipment with a built-in voice recognition engine, and when the fact that a user enters a preset area is detected, the voice recognition engine of the intelligent equipment is awakened, the distance between the user and the intelligent equipment is obtained, so that the user is prompted to send a voice command with a volume value corresponding to the distance, and the voice command sent by the user is recognized based on the voice recognition engine, so that the intelligent equipment is controlled to execute actions corresponding to the voice command. The invention also provides the intelligent equipment and a storage medium. The invention avoids the complex process of awakening the voice recognition engine by utilizing the awakening words, reduces the awakening difficulty when the intelligent equipment carries out voice interaction, avoids the condition of mistakenly recognizing environmental noise and improves the recognition accuracy of the voice interaction of the intelligent equipment.

Description

Voice interaction method, intelligent device and storage medium

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a speech interaction method, an intelligent device, and a storage medium.

Background

With the development of technology, voice interaction is applied to many intelligent products, such as intelligent sound boxes, as an important interaction mode. The user can awaken the intelligent product through a specific awakening word and then control the intelligent product through a voice instruction, so that both hands are liberated.

However, when using the wake-up word to wake up the smart product, the problem of difficult wake-up is often encountered. And when the environmental noise is great, awakening the intelligent product becomes more difficult, and the problem of misidentification also occurs, so that the identification accuracy is poor.

Disclosure of Invention

The invention mainly aims to provide a voice interaction method, aiming at reducing the awakening difficulty of voice interaction of intelligent equipment and improving the recognition accuracy.

In order to achieve the above object, the voice interaction method provided by the present invention is applied to an intelligent device, wherein a voice recognition engine is built in the intelligent device, and the voice interaction method comprises the following steps:

when detecting that a user enters a preset area, awakening a voice recognition engine of the intelligent device;

acquiring the distance between a user and the intelligent equipment;

prompting the user to send a voice command by using the volume value corresponding to the distance;

and recognizing a voice instruction sent by a user based on a voice recognition engine so as to control the intelligent equipment to execute an action corresponding to the voice instruction.

Further, before the step of waking up the speech recognition engine of the smart device when the user is detected to enter the preset area, the method further includes:

acquiring human body detection information in a preset area; or the like, or, alternatively,

and receiving the human body detection information in the preset area fed back by other terminals.

Further, the smart machine still is equipped with infrared human detection device, the step of obtaining the human detection information in the predetermined area includes:

after receiving the starting instruction, controlling the infrared human body detection device to monitor the human body information in the preset area so as to detect whether the user enters the preset area in real time.

Further, before the step of prompting the user to issue the voice command with the volume value corresponding to the distance, the method further comprises the following steps:

judging whether the user is in the recognition range of the voice recognition engine according to the acquired distance parameter;

if yes, the step of prompting the user to send out a voice command by the volume value corresponding to the distance is executed.

Further, the intelligent device is further provided with an indicator light strip, and the step of judging whether the user is in the recognition range of the voice recognition engine according to the acquired distance parameter comprises the following steps:

according to a mapping relation between a prestored distance and a lighting length, displaying a distance parameter between a user and the intelligent equipment, which is acquired by the infrared human body detection module, as the lighting length of the indicator light strip;

when the lighting length is larger than or equal to a preset length, judging that the user is in the identification range;

and when the lighting length is smaller than a preset length, judging that the user is out of the identification range.

Further, when it is determined that the user is out of the recognition range of the speech recognition engine, the speech recognition engine of the smart device is turned off or a prompt that the user approaches the smart device is output.

Further, the step of prompting the user to issue a voice command at a volume value corresponding to the distance includes:

receiving a voice instruction in a recognition range of a voice recognition engine, and calculating a volume value of the voice instruction;

acquiring a target volume value matched with the current distance according to a mapping relation between the pre-stored distance and the target volume value;

judging whether the volume value is smaller than the target volume value;

if so, controlling the indicator lamp strip to be lightened so as to prompt that the currently received voice instruction is invalid;

if not, the indicating lamp strip is controlled to be completely lightened so as to prompt that the currently received voice instruction is effective.

Further, the step of recognizing a voice command issued by a user to control the smart device to execute an action corresponding to the valid voice command includes:

when the currently received voice instruction is effective, recognizing the corresponding voice instruction as text information when the indicator light band is completely lightened based on the acoustic model and the grammatical structure;

after the syntactic and/or semantic analysis processing is carried out on the text information, a text segment matched with a preset keyword pointing to the user intention is extracted;

and judging the user intention based on the text segment, and outputting interactive data responding to the user according to the user intention or controlling the intelligent equipment to execute feedback actions responding to the user intention.

The present invention further provides an intelligent device, which includes a speech recognition engine, a memory, a processor, and a speech interaction program stored in the memory and executable on the processor, wherein:

the voice recognition engine is used for recognizing the received voice instruction;

the voice interaction program, when executed by the processor, implements the steps of the voice interaction method as described above.

The present invention also provides a storage medium storing a voice interaction program, which when executed by a processor implements the steps of the voice interaction method as described above.

The voice interaction method is applied to intelligent equipment with a built-in voice recognition engine, when the fact that a user enters a preset area is detected, the voice recognition engine of the intelligent equipment is awakened, the distance between the user and the intelligent equipment is obtained, the user is prompted to send a voice command with a volume value corresponding to the distance, and the voice command sent by the user is recognized based on the voice recognition engine so as to control the intelligent equipment to execute actions corresponding to the voice command. According to the voice interaction method, when the user is detected to enter the preset area, the voice recognition engine of the intelligent device is awakened, the user does not need to send a specific awakening word and then continuously send a control instruction, interaction times required for awakening the intelligent device are reduced, a complex process of awakening the voice recognition engine by utilizing the awakening word is avoided, and awakening difficulty of the intelligent device during voice interaction is reduced. The voice interaction method only identifies the voice command of the volume value corresponding to the distance where the user is located, avoids the situation of mistakenly identifying the environmental noise, and improves the identification accuracy of the intelligent device on the voice command in the voice interaction.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a schematic diagram of a hardware structure of an embodiment of an intelligent device according to the present invention;

FIG. 2 is a flowchart illustrating a voice interaction method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a voice interaction method according to a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating an embodiment of step S50 in FIGS. 2 and 3;

FIG. 5 is a flowchart illustrating an embodiment of step S60 in FIGS. 2 and 3;

fig. 6 is a flowchart illustrating an embodiment of step S70 in fig. 2 and 3.

The reference numbers illustrate:

reference numerals	Name (R)	Reference numerals	Name (R)
				100	Intelligent device	101	Radio frequency unit
102	WiFi module	103	Audio output unit
				104	A/V input unit	1041	Graphics processor
1042	Microphone (CN)	105	Sensor with a sensor element
				106	Display unit	1061	Display interface
107	User input unit	1071	Control interface
				1072	Other input devices	108	Interface unit
109	Memory device	110	Processor with a memory having a plurality of memory cells
				111	Power supply

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

Smart devices may be implemented in various forms. For example, the smart device described in the present invention may be implemented by a mobile terminal having a display interface, such as a mobile phone, a tablet computer, a notebook computer, a palm top computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, a smart speaker, or the like, or may be implemented by a fixed terminal having a display interface, such as a Digital TV, a desktop computer, an air conditioner, a refrigerator, a water heater, a dust collector, or the like.

While the following description will be given by way of example of a smart device, it will be appreciated by those skilled in the art that the configuration according to the embodiment of the present invention can be applied to a fixed type smart device, in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of an intelligent device for implementing various embodiments of the present invention, the intelligent device 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display area 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the smart device architecture shown in FIG. 1 does not constitute a limitation of a smart device, which may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

The following describes each component of the smart device in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access2000 ), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex Long Term Evolution), and TDD-LTE (Time Division duplex Long Term Evolution).

WiFi belongs to short-distance wireless transmission technology, and intelligent equipment can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the smart device, and may be omitted entirely as needed within the scope not changing the essence of the invention. For example, in this embodiment, the smart device 100 may establish a synchronization association relationship with an App terminal based on the WiFi module 102.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the smart device 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the smart device 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display area 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The smart device 100 also includes at least one sensor 105, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display interface 1061 according to the brightness of ambient light, and a proximity sensor that can turn off the display interface 1061 and/or backlight when the smart device 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display area 106 is used to display information input by the user or information provided to the user. The Display area 106 may include a Display interface 1061, and the Display interface 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smart device. In particular, the user input unit 107 may include a manipulation interface 1071 and other input devices 1072. The control interface 1071, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on or near the control interface 1071 using a finger, a stylus, or any other suitable object or attachment) and drive the corresponding connection device according to a predetermined program. The manipulation interface 1071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the manipulation interface 1071 can be implemented in various types, such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the manipulation interface 1071, the user input unit 107 may include other input devices 1072. In particular, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited to these specific examples.

Further, the manipulation interface 1071 may overlay the display interface 1061, and when the manipulation interface 1071 detects a touch operation thereon or nearby, transmit to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display interface 1061 according to the type of the touch event. Although in fig. 1, the control interface 1071 and the display interface 1061 are two separate components to implement the input and output functions of the smart device, in some embodiments, the control interface 1071 and the display interface 1061 may be integrated to implement the input and output functions of the smart device, which is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the smart device 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the smart device 100 or may be used to transmit data between the smart device 100 and the external device.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a speech recognition engine, etc.) required for at least one function, and the like; the storage data area may store data (such as interactive data, control instructions, networking devices, etc.) created according to the use of the smart device, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the smart device, connects various parts of the entire smart device using various interfaces and lines, and performs various functions of the smart device and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the smart device. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The smart device 100 may further include a power source 111 (such as a battery) for supplying power to various components, and preferably, the power source 111 may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

Although not shown in fig. 1, the smart device 100 may further include a bluetooth module and the like capable of establishing a communication connection with other terminals, which will not be described herein.

Based on the hardware structure of the intelligent device, the intelligent device provided by the embodiment of the invention is internally provided with the voice recognition engine, when the intelligent device detects that the user enters the preset area, the voice recognition engine of the intelligent device is awakened, the distance between the user and the intelligent device is acquired, the user is prompted to send a voice command by using a volume value corresponding to the distance, and the voice command sent by the user is recognized based on the voice recognition engine so as to control the intelligent device to execute the action corresponding to the voice command. According to the voice interaction method, when the user is detected to enter the preset area, the voice recognition engine of the intelligent device is awakened, the user does not need to send a specific awakening word and then continuously send a control instruction, interaction times required for awakening the intelligent device are reduced, a complex process of awakening the voice recognition engine by utilizing the awakening word is avoided, and awakening difficulty of the intelligent device during voice interaction is reduced. The voice interaction method only identifies the voice command of the volume value corresponding to the distance where the user is located, avoids the situation of mistakenly identifying the environmental noise, and improves the awakening accuracy of the voice interaction of the intelligent equipment.

As shown in fig. 1, the memory 109, which is a kind of computer storage medium, may include therein an operating system and a control program.

In the intelligent device 100 shown in fig. 1, the WiFi module 102 is mainly used for connecting to a background server or a big data cloud, performing data communication with the background server or the big data cloud, and implementing communication connection with other terminal devices; the processor 110 may be configured to invoke the voice interaction program stored in the memory 109 and perform the following operations:

acquiring the distance between a user and the intelligent equipment;

Further, the processor 110 may be further configured to call the voice interaction program stored in the memory 109 to perform the following operations:

Further, the intelligent device is provided with an infrared human body detection device, and the processor 110 may be further configured to call the voice interaction program stored in the memory 109 to perform the following operations:

Further, the intelligent device is further provided with an indicator light strip, and the processor 110 may be further configured to call the voice interaction program stored in the memory 109 to perform the following operations:

Further, the intelligent device is provided with an indicator light strip, and the processor 110 may be further configured to call the voice interaction program stored in the memory 109 to perform the following operations:

and when the user is judged to be out of the recognition range of the voice recognition engine, the voice recognition engine of the intelligent device is closed or a prompt that the user approaches the intelligent device is output.

judging whether the volume value is smaller than the target volume value;

The invention further provides a voice interaction method which is applied to intelligent equipment.

Referring to fig. 2 or 3, fig. 2 is a schematic flowchart of a first embodiment of a voice interaction method of the present invention, and fig. 3 is a schematic flowchart of a second embodiment of the voice interaction method of the present invention.

In this embodiment, a speech recognition engine is built in the smart device, and the speech interaction method includes the following steps:

s30: when detecting that a user enters a preset area, awakening a voice recognition engine of the intelligent device;

in this embodiment, the smart device is provided with a speech recognition engine and a microphone or a microphone array electrically connected to the speech recognition engine, and the speech recognition engine recognizes the picked speech to obtain a recognition result.

In order to reduce the power consumption of the intelligent device, the speech recognition engine is generally in a closed state, and a wake-up instruction is triggered to wake up the speech recognition engine until a user is detected to enter a preset area. The microphone or microphone array may be always on, or may be turned on when it is detected that the user enters a predetermined area. The intelligent device can directly monitor the preset area, for example, a corresponding monitoring module is arranged on the intelligent device. In other embodiments, the intelligent device may also receive human detection information from other terminals, for example, the camera or the infrared human detection module is disposed on other terminals in the room, and the monitored human detection information in the preset area is fed back to the intelligent device by the other terminals, so as to wake up the speech recognition engine, so as to wait for the user to send a speech signal, and the user does not need to send a control instruction after sending a specific wake-up word, thereby reducing the number of interactions required for waking up the intelligent device, improving user experience, avoiding the situation that the speech recognition engine is always in a state of detecting the wake-up word, increasing power consumption and the possibility that the speech recognition engine is mistakenly woken by environmental noise, and improving the wake-up accuracy during speech interaction.

S40: acquiring the distance between a user and the intelligent equipment;

after the voice recognition engine is awakened, in order to further improve the recognition accuracy of the intelligent device for voice in voice interaction, and meanwhile, in order to reduce the energy consumption of the voice recognition engine, the voice recognition engine is set to recognize only an effective voice instruction within a preset range, and whether the voice instruction sent by the user is the effective voice instruction is determined, firstly, the position of the user needs to be determined, namely, the distance between the user and the intelligent device needs to be acquired.

S60: prompting the user to send a voice command by using the volume value corresponding to the distance;

because the distance between the user and the intelligent device acquired each time may be different, and the user may not always send a voice instruction with the same volume at different positions, in order to improve the recognition accuracy of the voice recognition engine and reduce the power consumption of the voice recognition engine, the user may be prompted to send a voice instruction with a volume value corresponding to the distance between the current position and the intelligent device based on the volume value of the voice instruction sent by the user and picked up by the microphone.

S70: and recognizing a voice instruction sent by a user based on a voice recognition engine so as to control the intelligent equipment to execute an action corresponding to the voice instruction.

After the voice recognition engine is awakened, prompting a user to send a voice command by a volume value of the voice command sent by the user and picked up by a microphone according to the volume value corresponding to the distance between the current position and the intelligent device, recognizing the received voice command by the voice recognition engine, wherein the recognition process mainly depends on syntactic and semantic analysis in natural language and sometimes also depends on emotion analysis, achieving the purpose of automatically analyzing the meaning representation of the user by depending on artificial intelligence through a training model, and further controlling the intelligent device to execute the interactive action corresponding to the meaning representation of the user, for example, when the voice command sent by the user is judged to be the meaning representation of a query class, acquiring a first keyword from the query content input by the user, for example, when the user wants to query the current weather, the intelligent device automatically acquires 'weather' as the first keyword, and searching text contents related to the weather in the cloud big data based on the weather, such as the temperature, humidity, ultraviolet intensity, PM2.5, wind power and other information of the current weather, and generating a query result. When the meaning of a user is represented by a playing class, correspondingly generating multimedia content matched with the playing requirement and playing information matched with the multimedia content, such as playing duration, playing format, playing code rate and the like of the multimedia content according to the playing requirement, when a voice instruction sent by the user is judged to be represented by the playing class meaning, acquiring a second keyword from the playing requirement input by the user, such as 'Fly away' of the user wanting to play Lijing, automatically acquiring 'Lijing Ru' and 'Fly away' as the second keyword by the intelligent device, searching multimedia content and introduction information related to the 'Lijing Ru' and the 'Fly away' in a local database or cloud big data, generating the multimedia content and the playing information, wherein the playing information comprises information such as playing duration, audio encoding rate, tone quality, introduction song and the like, executing playing action by the intelligent device or pushing the playing information to a sound box connected with the intelligent device to enter the playing information And (6) line playing.

And when the recognized voice command is represented by a control meaning, acquiring a controlled terminal and control information from the voice command, generating a corresponding control command, and sending the control command to the controlled terminal which is expected to be controlled by a user, so that the controlled terminal executes the control command, and if the user wants to control that the temperature of the air conditioner is 18 ℃, modifying the temperature attribute value of the air conditioner to 18 ℃ according to the control information, and feeding back the corresponding information to the user.

The voice interaction method is applied to intelligent equipment with a built-in voice recognition engine, when the fact that a user enters a preset area is detected, the voice recognition engine of the intelligent equipment is awakened, the distance between the user and the intelligent equipment is obtained, the user is prompted to send a voice command with a volume value corresponding to the distance, and the voice command sent by the user is recognized based on the voice recognition engine so as to control the intelligent equipment to execute actions corresponding to the voice command. According to the voice interaction method, when the user is detected to enter the preset area, the voice recognition engine of the intelligent device is awakened, the user does not need to send a specific awakening word and then continuously send a control instruction, interaction times required for awakening the intelligent device are reduced, a complex process of awakening the voice recognition engine by utilizing the awakening word is avoided, and awakening difficulty of the intelligent device during voice interaction is reduced. The voice interaction method only identifies the voice command of the volume value corresponding to the distance where the user is located, avoids the situation of mistakenly identifying the environmental noise, and improves the awakening accuracy of the voice interaction of the intelligent equipment.

Further, the voice interaction method based on the above embodiment, before step S30, further includes:

s10: acquiring human body detection information in a preset area; or the like, or, alternatively,

referring to fig. 2, in this embodiment, the intelligent device directly obtains human detection information in a preset area, as described above, the intelligent device directly monitors the preset area, and if a corresponding monitoring module is set in the intelligent device, the intelligent device may specifically be a camera that monitors through a video image, or an infrared human detection module that monitors through an infrared image, and when it is detected that a user enters the preset area, such as a room where the intelligent device is located, a wake-up instruction is generated to wake up a speech recognition engine of the intelligent device. In another embodiment, the intelligent device may also receive detection data of other detection elements, and after the detection data is processed by the processor, generate an awakening instruction for awakening the speech recognition engine to awaken the speech recognition engine, for example, a temperature sensor array for detecting a temperature change in a preset area, or a carbon dioxide concentration sensor array for detecting a carbon dioxide concentration change in the preset area, determine whether the user enters the preset area by monitoring the temperature change or the carbon dioxide concentration change in the preset area, and then generate a corresponding awakening instruction when it is determined that the user enters the preset area, and awaken the speech recognition engine to wait for a speech signal sent by the user, thereby implementing intelligent speech interaction.

S20: and receiving the human body detection information in the preset area fed back by other terminals.

Referring to fig. 3, in this embodiment, the intelligent device receives human detection information of other terminals, for example, a camera or an infrared human detection module is disposed on other terminals in a room, and the other terminals feed back the monitored human detection information in a preset area to the intelligent device, specifically referring to the first embodiment described with reference to fig. 2.

Further, the smart machine is equipped with infrared human detection device, the step of obtaining the human detection information in the preset area includes:

In this embodiment, the smart machine facial make-up is equipped with infrared human detection device, infrared human detection device mainly utilizes the human principle of radiating the heat to the surrounding environment, monitors the heat source distribution in the predetermined region to whether the user gets into the predetermined region according to the heat source change in the predetermined region carries out real-time detection. And when detecting that the distribution of the heat source in the preset area changes or a new heat source appears in the preset area, judging that the user enters the preset area.

Further, referring to fig. 2 or 3, before step S50, the voice interaction method according to the foregoing embodiment further includes:

s50: judging whether the user is in the recognition range of the voice recognition engine according to the acquired distance parameter;

in this embodiment, because the speech recognition engine only recognizes the speech with the speech volume reaching the preset threshold, and the preset threshold is the minimum volume value of the speech that the speech recognition engine can recognize when the user is closest to the smart device, the user needs to increase the volume value of the speech that is made continuously as the user is away from the smart device until the volume value of the speech that is made by the user reaches the maximum, and at this time, the distance between the user and the smart device can be determined as the recognition range of the speech recognition engine.

Further, referring to fig. 4, the first embodiment of the step S50 based on the voice interaction method of the above embodiment includes:

s51: according to a mapping relation between a prestored distance and a lighting length, displaying a distance parameter between a user and the intelligent equipment, which is acquired by the infrared human body detection module, as the lighting length of the indicator light strip;

in this embodiment, the distance between the user and the smart device is mainly obtained based on the infrared human body detection module, and after the distance parameter between the user and the smart device is obtained, it is necessary to determine whether the user is in the recognition range of the speech recognition engine. When the intelligent device is judged specifically, in order to enable a user to know more intuitively that the user can be identified by a voice recognition engine of the intelligent device when a voice instruction is sent out at the current position, the real-time distance parameter between the user and the intelligent device, which is acquired by the infrared human body detection module, is displayed as the lighting length of the indicator strip according to the mapping relation between the distance and the lighting length, which is established in advance, by means of the indicator strip of the intelligent device, wherein the indication strip is composed of 5 LED lamp beads arranged in a row, and the lighting length can also be replaced by the lighting number of the indicator strip.

S52: when the lighting length is larger than or equal to a preset length, judging that the user is in the identification range;

judging whether a user is in the recognition range of the voice recognition engine or not according to the preset lighting length or the lighting number of the indication lamp band corresponding to the preset recognition range of the voice recognition engine, if the user is judged to be in the recognition range of the voice recognition engine when the three-fifths length or the number of the lighting of the indication lamp band is set, judging that the user is in the recognition range currently when the three-fifths length or more than 3 LED lamp beads are lighted in the indication lamp band.

S53: and when the lighting length is smaller than a preset length, judging that the user is out of the identification range.

And when the lighting length of the indicator light strip is less than three fifths or less than 3, if only 2 or 1 LED lamp beads are lighted, the user is judged to be out of the recognition range of the voice recognition engine currently. In order to reduce the power consumption of the speech recognition engine, the speech recognition engine may be selectively turned off, or a prompt that the user approaches the smart device may be output to shorten the distance between the two.

Further, referring to fig. 2 or 3, the voice interaction method based on the above embodiment further includes:

s80: and when the user is judged to be out of the recognition range of the voice recognition engine, the voice recognition engine of the intelligent device is closed or a prompt that the user approaches the intelligent device is output.

In the present embodiment, when the user is determined to be out of the recognition range of the speech recognition engine, there may be two cases: the first is that the user is actually far away from the smart device, so that even if the user outputs a voice signal with the maximum volume, the volume value of the voice signal received by the voice recognition engine after being attenuated for a long distance is still smaller than the preset threshold value of the voice recognition engine, so that the user needs to move to the smart device to reduce the distance between the two devices.

The second is that the distance between user and the smart machine does not surpass and predetermine the maximum distance, when predetermineeing the maximum distance for the speech signal of user output maximum volume, the speech recognition engine just can receive and the maximum distance of discernment, at this moment, if during speech signal that the speech recognition engine can not discern the user and send, can indicate the user to increase the output volume, also improve speech signal's speech volume, certainly, as above, in order to reduce the consumption of smart machine, or prevent that infrared human detection module mistake from awakening up the speech recognition engine, when the user does not want to carry out speech interaction with the smart machine, can close the speech recognition engine.

Further, referring to fig. 5, the second embodiment of the step S60 based on the voice interaction method of the above embodiment includes:

s61: receiving a voice instruction in a recognition range of a voice recognition engine, and calculating a volume value of the voice instruction;

in this embodiment, when it is determined that the user is in the recognition range of the speech recognition engine, the microphone may be controlled to pick up the speech command issued by the user in the recognition range, and after receiving the speech command, the speech command is processed, and since the speech command is an electromagnetic wave and is always in an attenuated state during transmission, the volume value of the speech command picked up by the microphone mainly depends on the position of the sound source from the microphone and the intensity value of the sound source, that is, mainly depends on the distance between the user and the smart device and the volume value of the speech issued by the user. In addition, since the speech recognition engine only recognizes the speech signal whose volume value reaches the preset threshold value, it is necessary to calculate the volume value of the received speech command to determine whether the speech command is a valid command that can be recognized by the speech recognition engine.

S62: acquiring a target volume value matched with the current distance according to a mapping relation between the pre-stored distance and the target volume value;

when the currently received voice command needs to be effectively judged, the target volume value corresponding to the current position of the user only needs to be obtained according to the mapping relation between the prestored distance and the target volume value, so that the response speed of the intelligent device is improved, and the recognition efficiency of the intelligent device during voice interaction is improved.

S63: judging whether the volume value is smaller than the target volume value;

if yes, go to step S64, otherwise go to step S65;

s64: controlling the indicator lamp strip to be lightened so as to prompt that the currently received voice instruction is invalid;

in order to facilitate a user to intuitively know whether a currently sent voice instruction is an effective instruction which can be recognized by a voice recognition engine, the volume value of the received voice instruction is indicated by the lighting length or the lighting number of the indicator light strip, and when the volume value is smaller than a target volume value corresponding to the current position of the user, the indicator light strip is partially lighted to prompt the user that the currently sent voice instruction is invalid. The specific number of the lighted beads is according to the above embodiment, the larger the received volume value is, the more the number of the lighted beads is, or the longer the lighted length is, so as to prompt the user to increase the output volume value of the voice instruction.

S65: and controlling the indicator lamp strip to be completely lightened to prompt that the currently received voice instruction is effective.

And when the volume value of the currently received voice instruction is larger than or equal to the target volume value corresponding to the current position of the user, controlling the indicator lamp strip to be completely lightened according to a preset program so as to prompt the user that the currently sent voice instruction is effective.

Further, referring to fig. 6, the voice interaction method based on the foregoing embodiment, step S70, includes:

s71: when the currently received voice instruction is effective, recognizing the corresponding voice instruction as text information when the indicator light band is completely lightened based on the acoustic model and the grammatical structure;

in this embodiment, when the indicator light strip is completely lit, the voice instruction currently sent by the user is prompted to be valid, and the voice recognition engine may recognize the valid voice instruction to obtain the meaning indication of the user. The recognition of the voice command is mainly the recognition of the user intention, and mainly depends on the syntax and semantic analysis in the natural language, and in other embodiments, emotion analysis may be needed, and the syntax and semantic analysis and the emotion analysis are performed based on texts, so when the voice command is recognized, the voice command needs to be recognized as text information firstly. The currently received voice command is recognized as text information in combination with the percentage of words formed, e.g., by acoustic models and grammatical structures in the voice recognition.

S72: after the syntactic and/or semantic analysis processing is carried out on the text information, a text segment matched with a preset keyword pointing to the user intention is extracted;

the natural language understanding is that a model is trained through text information to achieve recognition and judgment of user meaning expression, the meaning expression of a user can be automatically analyzed by relying on artificial intelligence through the training model, the training model depends on keywords, different keywords are trained to correspond to different user intentions through configuring preset keywords pointing to different user intentions, and then after the voice instruction is recognized as the text information and is analyzed and processed through syntax and semantics in natural language understanding, a text segment matched with the preset keywords is extracted from the text information.

S73: and judging the user intention based on the text segment, and outputting interactive data responding to the user according to the user intention or controlling the intelligent equipment to execute feedback actions responding to the user intention.

After extracting a text segment matched with a preset keyword pointing to a user intention, judging the user intention pointed by the voice instruction based on the text segment, and outputting corresponding interactive data or control instructions based on the category of the user intention, wherein the user intention can be divided into a query category, a control category and a play category according to keyword information, when the user intention is determined according to the text segment, the category of the user intention needs to be further determined, for example, a control action matched with the keyword is determined from a chat category, an encyclopedia category, a real-time weather query category, a forecast weather query category, a song list play category, a song play category, a household appliance control category and a household appliance query category according to keyword information, and then the category of the user intention is determined; and then outputting interactive data responding to the user based on the type of the user intention, such as a weather forecast or a traffic condition query result, or controlling the intelligent device to play music or broadcast information required by the user, or controlling the intelligent device to execute functional actions required by the user, such as a refrigerating action of an air conditioner, a dust removing action of a dust collector, a juicing action of a juicer or an air purifying action of an air purifier.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores a voice interaction program, and the voice interaction program, when executed by a processor, implements the steps of the voice interaction method described above.

The method implemented when the voice interaction program is executed may refer to each embodiment of the voice interaction method of the present invention, and details thereof are not repeated herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A voice interaction method is applied to intelligent equipment and is characterized in that a voice recognition engine is arranged in the intelligent equipment, and the voice interaction method comprises the following steps:

acquiring the distance between a user and the intelligent equipment;

recognizing a voice instruction sent by a user based on a voice recognition engine so as to control the intelligent equipment to execute an action corresponding to the voice instruction;

before the step of prompting the user to send out a voice command with a volume value corresponding to the distance, the method further comprises the following steps:

if yes, prompting the user to send a voice command by a volume value corresponding to the distance;

the intelligent device is further provided with an indicator light strip, and the step of judging whether the user is in the recognition range of the voice recognition engine according to the acquired distance parameters comprises the following steps:

2. The voice interaction method of claim 1, further comprising, before the step of waking up a voice recognition engine of the smart device when the user is detected to enter the preset area:

3. The voice interaction method according to claim 2, wherein the intelligent device is further provided with an infrared human body detection device, and the step of acquiring the human body detection information in the preset area comprises:

4. The voice interaction method according to claim 2 or 3, wherein when the user is determined to be out of the recognition range of the voice recognition engine, the voice recognition engine of the smart device is turned off or a prompt that the user is close to the smart device is output.

5. The voice interaction method of claim 4, wherein the step of prompting the user to issue a voice command at a volume value corresponding to the distance comprises:

judging whether the volume value is smaller than the target volume value;

6. The voice interaction method of claim 5, wherein the step of recognizing the voice command issued by the user to control the smart device to execute the action corresponding to the voice command comprises:

7. A smart device comprising a speech recognition engine, a memory, a processor, and a speech interaction program stored in the memory and executable on the processor, wherein:

the voice interaction program, when executed by the processor, implements the steps of the voice interaction method of any of claims 1 to 6.

8. A storage medium, characterized in that the storage medium stores a voice interaction program, which when executed by a processor implements the steps of the voice interaction method according to any one of claims 1 to 6.