CN113689853A

CN113689853A - Voice interaction method and device, electronic equipment and storage medium

Info

Publication number: CN113689853A
Application number: CN202110919033.6A
Authority: CN
Inventors: 李志明; 周力恒; 韩锋
Original assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-11-23

Abstract

The disclosure relates to a voice interaction method, a voice interaction device, an electronic device and a storage medium, wherein the method comprises the following steps: receiving a current scene identifier sent by the central control module, wherein the current scene identifier is: the central control module is determined according to current trigger information of the trigger information, and the current trigger information is used for representing: monitoring environmental information which is collected by equipment and reaches a trigger threshold value; determining a voice response strategy according to the current scene identifier; and initiating voice interaction according to the voice response strategy. In the method, the audio equipment can actively initiate voice interaction according to the current scene identification and by combining a voice response strategy, so that the method is stronger in activity and applicability, more suitable for retail or exhibition hall environments, and expands the use scene of the audio equipment.

Description

Voice interaction method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of communications, and in particular, to a voice interaction method and apparatus, an electronic device, and a storage medium.

Background

With the development of the internet of things technology, people's lives are gradually intelligentized and technologized. The internet of things is a network extending and expanding on the basis of the internet, and can combine various devices with the internet to realize information exchange and communication. The internet of things technology is often applied to smart homes in family life, and various smart electrical appliances in the family can be remotely controlled by using the mobile phone terminal, so that great convenience is brought to the life of people. And intelligent audio equipment is indispensable in intelligent house, and other intelligent equipment of user's accessible intelligent audio equipment control.

In the related art, the intelligent audio device needs to be called by a user, and the application limitation is large.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a voice interaction method, apparatus, electronic device, and storage medium.

According to a first aspect of the embodiments of the present disclosure, a voice interaction method is provided, which is applied to an audio device, where the audio device is in communication connection with a central control module, and the method includes:

receiving a current scene identifier sent by the central control module, wherein the current scene identifier is: the central control module is determined according to current trigger information of the trigger information, and the current trigger information is used for representing: monitoring environmental information which is collected by equipment and reaches a trigger threshold value;

determining a voice response strategy according to the current scene identifier;

and initiating voice interaction according to the voice response strategy.

In some embodiments, the determining a voice response policy according to the current scene identifier includes:

acquiring first configuration information, wherein the first configuration information comprises a corresponding relation between an application scene identifier and a voice response strategy;

and determining a corresponding voice response strategy in the first configuration information according to the current scene identifier.

In some embodiments, the voice response policy comprises: a plurality of interactive contents associated with the current scene identification; the initiating voice interaction according to the voice response strategy comprises:

determining and playing initial interactive content according to the current scene identifier;

receiving a feedback response of a user;

and determining and playing target interactive content corresponding to the keywords according to the keywords in the feedback response.

In some embodiments, the method further comprises:

stopping voice interaction in response to triggering a voice termination condition;

wherein the voice termination condition comprises: and the time length for receiving the feedback response exceeds the preset time length, or the feedback response comprises a preset termination keyword, or a termination instruction is received.

In some embodiments, the determining and playing the target interactive content corresponding to the keyword according to the keyword in the feedback response includes:

responding to the feedback response comprises: a control instruction and an intelligent device keyword are sent to the central control module, and the target interactive content is played;

wherein the target interactive content comprises: a prompt message that the control instruction has been executed.

According to a second aspect of the embodiments of the present disclosure, a voice interaction method is provided, which is applied to a central control module, where the central control module is in communication connection with an audio device and a monitoring device in a current environment, where the monitoring device includes a sensor and/or an image capturing device disposed in the current environment; the method comprises the following steps:

receiving current trigger information sent by monitoring equipment, wherein the current trigger information is used for representing: environmental information which is collected by the monitoring equipment and reaches a trigger threshold value;

determining a corresponding current scene identifier according to the current trigger information;

and sending the current scene identification to the audio equipment.

In some embodiments, the determining, according to the current trigger information, a corresponding current scene identifier includes:

calling second configuration information, wherein the second configuration information comprises a corresponding relation between trigger information and an application scene identifier;

and determining the corresponding current scene identifier in the second configuration information according to the current trigger information.

In some embodiments, the method further comprises:

receiving a control instruction sent by the audio equipment;

and controlling the corresponding intelligent equipment to operate according to the control instruction.

According to a third aspect of the embodiments of the present disclosure, a voice interaction apparatus is provided, which is applied to an audio device, where the audio device is in communication connection with a central control module, and the apparatus includes:

a first receiving module, configured to receive a current scene identifier sent by the central control module, where the current scene identifier is: the central control module is determined according to current trigger information of the trigger information, and the current trigger information is used for representing: monitoring environmental information which is collected by equipment and reaches a trigger threshold value;

the first determining module is used for determining a voice response strategy according to the current scene identifier;

and the voice interaction module is used for initiating voice interaction according to the voice response strategy.

In some embodiments, the first determination module is to:

In some embodiments, the voice response policy comprises: a plurality of interactive contents associated with the current scene identification; the voice interaction module is used for:

receiving a feedback response of a user;

In some embodiments, the apparatus further comprises: the first control module is used for responding to a trigger voice termination condition and stopping voice interaction;

In some embodiments, the voice interaction module is further to:

According to a fourth aspect of the embodiments of the present disclosure, a voice interaction apparatus is provided, which is applied to a central control module, where the central control module is in communication connection with an audio device and a monitoring device in a current environment, where the monitoring device includes a sensor and/or an image capturing device disposed in the current environment; the device comprises:

a second receiving module, configured to receive current trigger information sent by a monitoring device, where the current trigger information is used to characterize: environmental information which is collected by the monitoring equipment and reaches a trigger threshold value;

a second determining module, configured to determine, according to the current trigger information, a corresponding current scene identifier;

and the sending module is used for sending the current scene identifier to the audio equipment.

In some embodiments, the second determination module is to:

In some embodiments, the apparatus further comprises: a second control module for

Receiving a control instruction sent by the audio equipment;

According to a fifth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the voice interaction method as described in any one of the above.

According to a sixth aspect of embodiments of the present disclosure, a non-transitory computer-readable storage medium is presented, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a voice interaction method as recited in any one of the above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: in the method, the central control module determines the current scene identifier according to the current trigger information of the trigger information. The audio equipment can actively initiate voice interaction according to the current scene identification and in combination with a voice response strategy, has stronger initiative and applicability, is more suitable for retail or exhibition hall environments, and expands the use scene of the audio equipment.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic diagram of a voice interaction system shown in accordance with an exemplary embodiment.

FIG. 2 is a flow chart illustrating a method according to an example embodiment.

FIG. 3 is a flow chart illustrating a method according to an example embodiment.

FIG. 4 is a flow chart illustrating a method according to an example embodiment.

FIG. 5 is an interaction diagram illustrating a method in accordance with an exemplary embodiment.

FIG. 6 is a block diagram illustrating an apparatus according to an example embodiment.

Fig. 7 is a block diagram illustrating an apparatus according to an example embodiment.

FIG. 8 is a block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In the related art, the interaction mode of the intelligent audio equipment is single, and after the intelligent audio equipment needs to be manually called, the corresponding control function is realized according to a definite instruction of a user. The flexibility and applicability of voice interaction are poor, and only a single round of conversation can be realized. For environments with visitors, such as exhibition halls or retail environments, the intelligent audio equipment cannot realize the functions of reception or shopping guide, and the internet of things is limited in application.

The utility model provides a voice interaction method, which is applied to audio equipment, the audio equipment is in communication connection with a central control module, and the method comprises the following steps: receiving a current scene identifier sent by the central control module, wherein the current scene identifier is determined by the central control module according to current trigger information of the trigger information, and the current trigger information is used for representing: and monitoring the environmental information which is collected by the equipment and reaches the triggering threshold value. And determining a voice response strategy according to the current scene identifier. And initiating voice interaction according to the voice response strategy. In the method, the central control module determines the current scene identifier according to the current trigger information of the trigger information. The audio equipment can actively initiate voice interaction according to the current scene identification and in combination with a voice response strategy, has stronger initiative and applicability, is more suitable for retail or exhibition hall environments, and expands the use scene of the audio equipment.

In an exemplary embodiment, the present embodiment provides a voice interaction method, which is applied to an audio device, where the audio device is in communication connection with a central control module. The audio device is, for example, an intelligent sound box, an intelligent screen, or other intelligent audio and video device. The central control module is, for example, a central control system or a server.

As shown in fig. 1, the voice interaction system in this embodiment includes a central control module 10, an audio device 20, a smart device 30, an application 40, and a monitoring device 50. The central control module 10 is in communication connection with the audio device 20, the smart device 30, the application program 40, and the monitoring device 50, respectively. The intelligent device 30 is, for example, an intelligent household appliance such as an intelligent air conditioner, an intelligent refrigerator, and an air purifier. The application 40 is, for example, a retail assistant APP, and may be installed in a mobile terminal such as a mobile phone or a tablet computer. The monitoring device 50 includes, for example, various sensors such as an infrared sensor, a laser scanner, a radio frequency identifier, a smoke detector, a temperature sensor, a light sensor, and the like, and an image acquisition device; the image acquisition equipment can be monitoring equipment such as a camera module.

In this embodiment, the central control module 10 is provided with: a device binding module 101, a state machine module 102, a scene module 103, and an ITO (internet of things) control module 104. The device binding module 101 communicates with the application program 40, and binds the logged-in account information and the plurality of types of intelligent devices 30 with the central control module 10 according to an instruction sent by the application program 40. A plurality of state machine templates suitable for different scenes are arranged in the state machine module 102, and the state machine templates and the scene identifiers have corresponding relations; different results can be output according to different input in different scenes or the same scene. The scene module 103 configures skills, instructions or a configuration table containing voice response strategies in different scenes, and the audio device 20 may obtain different voice response strategies in the scene module 103. The ITO control module 104 supports various control protocols of the smart device 30, and realizes control of the smart device 30 in different scenarios.

As shown in fig. 2, the method of the present embodiment may include the following steps:

and S110, receiving the current scene identification sent by the central control module.

And S120, determining a voice response strategy according to the current scene identifier.

And S130, initiating voice interaction according to the voice response strategy.

In step S110, the current scene identifier is determined by the central control module according to the current trigger information of the trigger information, and the current trigger information is used to characterize: and monitoring the environmental information which is collected by the equipment and reaches the triggering threshold value.

With reference to the embodiment of fig. 4, the monitoring device may collect environment information in different application scenarios in real time, where the environment information includes, for example, image information or infrared information. When the collected environment information reaches a trigger threshold, indicating that a voice interaction request exists in the current application scene; the monitoring equipment can report the acquired environmental information to the central control module as the current trigger information.

Different application scenes can be represented by setting different scene identifiers in the central control module. In the current environment or the current scene, the central control module can determine the corresponding current scene identification by combining the acquisition result of the monitoring equipment, and sends the current scene identification to the audio equipment. The audio device receives a current scene identification.

In step S120, the audio device may learn the application scenario characterized by the current scenario identifier, so as to determine the voice response policy.

The storage position of the audio equipment can store the voice response strategies under different application scenes, and the audio equipment acquires the corresponding voice response strategies from the storage position. Alternatively, as shown in fig. 1, the audio device obtains the voice response policy from the scene module 103.

Different application scenarios may correspond to different voice response strategies. For example, in a retail scenario, the voice response policy includes relevant question and answer information for the retail product. In the exhibition scene, the voice response strategy comprises the relevant question and answer information of the exhibition product performance. In the intelligent home scene, the voice response strategy comprises the indoor environment and the related question and answer information controlled by the intelligent equipment.

In step S130, the audio device may actively initiate voice interaction with the user according to the voice response policy corresponding to the current application scenario. For example, for a retail scenario, the audio device may actively interact with the user and receive response information from the user.

In an exemplary embodiment, step S120 in this embodiment may include the following steps:

s1201, obtaining first configuration information.

S1202, according to the current scene identification, a corresponding voice response strategy in the first configuration information is determined.

In step S1201, the first configuration information includes a correspondence between the application scenario identifier and a voice response policy, where the voice response policy may be a corpus set of a toronto dialog related to the application scenario. The first configuration information may be configured and stored in a scene module of the central control module. In this embodiment, the audio device obtains first configuration information of a scene module in the central control module.

In step S1202, after the audio device acquires the first configuration information, according to the current scene identifier, a voice response policy corresponding to the current scene identifier in the first configuration information may be determined in a traversal query manner.

In one exemplary embodiment, the voice response policy includes: a plurality of interactive contents associated with the current scene identification.

As shown in fig. 3, in this embodiment, step S130 may include the following steps:

and S1301, determining and playing initial interactive content according to the current scene identification.

And S1302, receiving a feedback response of the user.

And S1303, determining target interactive contents corresponding to the keywords according to the keywords in the feedback response and playing the target interactive contents.

In step S1301, the audio device identifies an application scenario represented by the current scenario, and the determined voice response policy includes a plurality of interactive contents in the current application scenario, that is, a corpus of a toronto dialog. Playing rules can be preset among the interactive contents, and in the embodiment, the interactive contents to be played are determined according to the application scene and the feedback response of the user.

For example, the audio device determines the initial interactive content in the corresponding voice response policy. The scenario determined from the current trigger information indicates that a voice interaction has just started and the initial interactive content may be a greeting. For example, the application scenarios characterized by the current scenario identifier are: retail scenarios, the initial interactive content may be "hello, welcome".

In step S1302, the user responds based on the initial interactive content, and the audio device may receive a feedback response of the user based on the initial interactive content.

In step S1303, the audio device may parse the feedback response based on the semantic recognition algorithm. And extracting the keywords in the feedback response, and determining target interactive content associated with the keywords. For example, the feedback response contains a keyword of "game of intelligence development type", and the target interactive content can be a position containing game of intelligence development type and a product introduction in combination with the retail scene of the above steps.

In one example, the targeted interactive content includes at least a first interactive content and a second interactive content. Step S1303 may include the steps of:

s1303-1, determining keywords in the feedback response, and determining first interactive content. In this step, after the audio device parses out the keyword, it can determine a plurality of interactive contents associated with the keyword in the corpus. The first interactive contents may be interactive contents playing further detailed information on the keyword. For example, the keyword is "toy", and the first interactive content may be "hello, there are some children's toys in the store, and can try to play".

S1303-2, playing the first interactive content, and receiving a response of the user based on the first interactive content. In this step, for example, the first interactive content may be "hello, there are some children's toys in the store, and can try to play". The audio device receives an answer based on this content, such as the user answering as "good, with or without a puzzle.

S1303-3, determining secondary keywords in the response based on the first interactive content, and determining second interactive content. In this step, the audio device further determines that the secondary keyword is a "puzzle" based on the response. The second interactive contents may be contents related to the secondary keyword such as a category and a location including "puzzle-type toy".

In this example, the audio device may also maintain voice interaction based on the user's feedback of the interactive content. For example, the user continues to ask for "toy play". The audio device continues to parse and determine new interactive content, such as introduction of toy play, based on the user's voice.

In another example, step S1303 may further include the steps of:

s1303-5, responding to the feedback response comprises: and the control instruction and the intelligent equipment keywords send the control instruction to the central control module, and the target interactive content is played. In this step, the target interactive content includes: a reminder that the control instruction has been executed.

For example, the feedback response includes: the key word of the control instruction of opening and the key word of the intelligent equipment of air conditioner are responded by the audio equipment according to the feedback, and the control instruction of opening the air conditioner can be sent to the central control module. And the central control module controls the air conditioner to be opened according to the control instruction and sends feedback information of executed instruction. After receiving the feedback information, the audio device plays the target interactive content for the user voice, such as: and playing a prompt message of 'air conditioner is turned on'.

In an exemplary embodiment, after step S130, the method of the present embodiment may further include the steps of:

and S140, responding to the trigger voice termination condition, and stopping voice interaction.

In this step, when the voice termination condition is triggered in the process of the interaction between the audio device and the user, the audio device will stop the voice interaction with the user.

In a first example, the speech termination condition includes: the time length for receiving the feedback response exceeds the preset time length. The preset time period may be preset in the audio device, for example, set to 5 seconds. After the audio device plays the interactive content, the audio device may stop the voice interaction for a preset duration in order to receive the feedback response of the user.

In a second example, the speech termination condition includes: the feedback response contains a preset termination keyword. The preset termination keywords include, for example, "known", "thank you", "good", and the like, which terminate a certain scene. And stopping voice interaction after the audio equipment receives the feedback response containing the preset termination keyword.

In a third example, the speech termination condition includes: a termination instruction is received. The termination instruction may be issued by a guest user or by an administrator user. The audio device may stop the voice interaction after receiving the termination instruction.

In an exemplary embodiment, the present disclosure further provides a voice interaction method, which is applied to a central control module, where the central control module is in communication connection with an audio device and a monitoring device in a current environment.

Wherein, monitoring facilities is including setting up sensor and/or image acquisition equipment in current environment, and monitoring facilities can set up in the entrance of current environment. For example, in a retail setting, the monitoring device is placed at the doorway. The sensors may include, for example, infrared sensors, laser scanners, radio frequency identifiers, smoke detectors, temperature sensors, light sensors, and the like; the image acquisition equipment can be monitoring equipment such as a camera module.

In this embodiment, as shown in fig. 1, the central control module 10 includes: a device binding module 101, a state machine module 102, a scene module 103, and an ITO (internet of things) control module 104. The device binding module 101 communicates with the application program 40, and binds the logged-in account information and the plurality of types of intelligent devices 30 with the central control module 10 according to an instruction sent by the application program 40. A plurality of state machine templates suitable for different scenes are arranged in the state machine module 102, and the state machine templates and the scene identifiers have corresponding relations; different results can be output according to different input in different scenes or the same scene. The scene module 103 configures skills, instructions or a configuration table containing voice response strategies in different scenes, and the audio device 20 may obtain different voice response strategies in the scene module 103. The ITO control module 104 supports various control protocols of the smart device 30, and realizes control of the smart device 30 in different scenarios.

As shown in fig. 4, the method in this embodiment may include the following steps:

s210, receiving current trigger information sent by the monitoring equipment.

And S220, determining a corresponding current scene identifier according to the current trigger information.

And S230, sending the current scene identification to the audio equipment.

In step S210, the current trigger information is used to characterize: and monitoring the environmental information which is collected by the equipment and reaches the triggering threshold value. The environment information includes, for example, image information or infrared information. When the collected environment information reaches a trigger threshold, indicating that a voice interaction request exists in the current application scene; the monitoring equipment can report the acquired environmental information to the central control module as the current trigger information. This step may be performed based on a state machine module.

For example, the monitoring device is an infrared sensor, and the infrared sensor acquires infrared information. When the infrared information reaches the set infrared threshold value, the visitor is indicated to pass through. The monitoring equipment reports the infrared information reaching the infrared threshold value to the central control module, and the central control module determines that the voice interaction request exists.

For another example, the monitoring device is a camera module, and the camera module collects image information. When the image information reaches the set image data threshold value, if a character is acquired or an unlocking event of the character is acquired, the fact that a visitor exists or an unlocking event exists is indicated. The monitoring equipment reports the image information to the central control module, and the central control module determines that the voice interaction request exists.

In step S220, after receiving the current trigger information, the central control module determines that the voice interaction request exists. The central control module can firstly analyze the voice interaction request, namely, analyze the current trigger information, and determine the current scene identifier or the current application scene corresponding to the current trigger information. The scene identifier may be an identifier or a number set by the central control module, and is used to represent the corresponding application scene.

For example, the present step may include the following steps:

s2201, calling second configuration information. In this step, the second configuration information may be a configuration table pre-stored in the central control module. The second configuration information includes a corresponding relationship between the trigger information and the application scene identifier. The trigger information and the application scene identifier may be in one-to-one correspondence, that is, one type of trigger information corresponds to one type of application scene identifier. Or multiple kinds of trigger information correspond to the same kind of application scene identification.

S2202, according to the current trigger information, determining the corresponding current scene identification in the second configuration information. In this step, the central control module may determine the current scene identifier corresponding to the current trigger information in the first configuration information in a table lookup or traversal query manner.

In this step, different monitoring devices may also collect data in different scenes. For example, the infrared device collects whether there is a visitor. The type of visitor is gathered to the module of making a video recording, and visitor type can be children, new visitor, old visitor etc.. The acquisition results of different sensors can correspond to different scene identifiers.

In step S230, the central module and the audio device may perform information interaction based on the same network communication connection. For example, the central control module sends the determined current scene identifier to the audio device to communicate the application scene in time.

In this step, the central control module may further determine a configuration table corresponding to the scene identifier in the scene module according to the scene identifier. For example, the configuration table corresponding to the current scene identifier is the first configuration information.

The central control module may send the first configuration information while sending the current scene identifier. Or the first configuration information may be sent again when the acquisition request of the audio device is received.

In addition, the central control module is in communication connection with other equipment in the voice interaction system, so that the result of each voice interaction and the real-time condition or the sales condition of the intelligent equipment can be clearly known. By combining with behavior analysis of visitors or users, the popularity of the product can be analyzed, and the reasonable retail strategy can be customized.

In an exemplary embodiment, the method in this embodiment may further include the steps of:

and S240, receiving a control instruction sent by the audio equipment.

And S250, controlling the corresponding intelligent equipment to operate according to the control instruction.

In step S240, during the voice interaction between the audio device and the user, the relevant control instruction of the user can be fed back in real time. For example, in the interaction process, the feedback response of the user includes a control instruction for opening the intelligent device, and the audio device can report the control instruction to the ITO control module in the central control module based on the communication connection.

In step S250, the ITO control module in the central control module may control the corresponding smart device to operate according to the control instruction. After the control instruction is executed, the central control module can also send feedback information that the instruction has been executed to the audio equipment.

To describe the above embodiments, application examples in different scenarios will be listed below.

Example 1: as shown in fig. 5, the exhibition hall scene has a visitor visiting scene.

S1, collecting environmental information in real time by monitoring equipment arranged at the doorway of the exhibition hall, for example, collecting infrared information by an infrared sensor. When the collected infrared information reaches a trigger threshold (current trigger information), it indicates that a visitor enters the doorway. And the monitoring equipment uploads the current trigger information to the central control module.

And S2, the central control module calls and queries the second configuration information according to the current trigger information, and determines the corresponding current scene identifier.

And S3, the central control module sends the current scene identification to the audio equipment.

S4, the audio equipment receives the current scene identification and acquires first configuration information in the scene module of the central control module.

And S5, the audio equipment determines a voice response strategy corresponding to the current scene identification according to the current scene identification and the first configuration information.

And S6, the audio equipment initiates voice interaction. For example, the initial interactive content of "welcome, i.e. the intelligent assistant in the store, can help you introduce and manipulate the products in the exhibition hall" is played first.

Example 2: in connection with fig. 5, an air conditioning scenario in an indoor scenario is shown.

And S1, collecting environmental information by monitoring equipment arranged indoors, such as a smoke detector for detecting air information. When the collected air information reaches a trigger threshold value (current trigger information) of haze weather, it indicates that indoor air is to be purified. And the monitoring equipment uploads the current trigger information to the central control module.

Referring to steps S2 to S5 of example 1, the center module communicates with the audio device, the center module may determine a current scene identifier, and the audio device may determine a voice response policy corresponding to the current scene identifier. Subsequently, step S6 is executed.

And S6, the audio equipment initiates voice interaction. For example, play the initial interactive content "owner, detect that air quality is poor, need to turn on air purifier? "

And S7, responding according to the feedback of the user, such as that the feedback is good/ok. The audio equipment can send an instruction for opening the air purifier to the central control module.

And S8, the ITO control module of the central control module controls the air purifier to be turned on, and the central control module sends a feedback message to the audio equipment.

And S9, playing a target interactive content 'the air purifier is turned on' for the user by the audio equipment, and simultaneously playing product information of the air purifier by the audio equipment.

Example 3: referring to fig. 5, a scene is recommended in a child toy exhibition hall.

And S1, collecting environmental information in real time by monitoring equipment arranged at the retail counter or the shop door. If the camera module collects image information, when the collected image information reaches a trigger threshold value, if the image information contains a child image (current trigger information), it indicates that a child visitor enters the system. Or, the sound sensor collects sound information, and when the collected sound information reaches a trigger threshold, if the sound information meets a child sound threshold (current trigger information), it indicates that a child visitor enters the system. And the monitoring equipment uploads the current trigger information to the central control module.

And S6, the audio equipment initiates voice interaction. Such as playing the following initial interactive content "some children's toys are in store, can be played".

And S7, responding according to the feedback of the user, wherein if the feedback response 1 is 'good, with or without the educational toy'. The audio equipment analyzes and determines a keyword ' educational toy ', can determine that the following target interactive content ' corresponding to the keyword ' exists, can try to play in a game area ', and plays the target interactive content.

S8, after collecting new trigger information, the monitoring device (such as infrared sensor) of the toy area indicates that the user walks to the game area. And sending new trigger information to the central control module. The central control module sends a new scene identification to the audio equipment, and the audio equipment can be controlled to broadcast welcome words and introduce the playing method of each toy. The central control module can also control the intelligent toy to demonstrate by itself.

Based on communication with the central control module, the audio device may monitor for scene changes and communicate with the user toronto, answer user questions. The central module may also determine inventory or sales of goods in the exhibition hall and inform the audio devices during communication with the audio devices.

Example 4: in connection with fig. 5, an indoor intelligent control scenario is shown.

And S1, collecting environmental information by monitoring equipment arranged at the door, wherein if the intelligent door lock monitors an unlocking event (current trigger information), the situation shows that the user opens the door. And the monitoring equipment uploads the current trigger information to the central control module.

And S6, the audio equipment initiates voice interaction. Such as playing the following initial interactive contents "welcome home", or "welcome home, whether the air conditioning device needs to be turned on".

S7, turning on the air conditioning equipment (or air cleaning equipment) according to the feedback response of the user, if the feedback response 2 is "good". The audio equipment analyzes the keywords for determining the control command and the intelligent equipment, and sends the control command for opening the air conditioning equipment to the central control module.

And S8, the central control module controls the air conditioning equipment to be turned on and sends a feedback message to the audio equipment.

S9, the audio equipment plays the following target interactive content: "the air conditioner has been turned on".

It will be appreciated that in either scenario example, if conditions for other scenarios to occur are present at the same time, new voice interactions may also continue to be enabled based on the central control module and the audio device.

According to the voice interaction method in the embodiment of the disclosure, the audio device initiates interaction with the user at first, and can realize a Torontal conversation with the user based on the central control module. The method has stronger applicability, can be applied to retail or exhibition hall scenes, realizes the functions of AI customer service and shopping guide by using the audio equipment, guides customers to recognize new intelligent products in stores, and introduces the details of the products. And the sales strategy is adjusted conveniently by analyzing the product popularity and the sales data statistics through behaviors. Meanwhile, the complete experience process of the smart home is provided for the user, so that the user is personally on the scene, and the product experience is improved. In addition, the cost of human resources can be saved, the business efficiency is improved, and the user retention time is prolonged.

In an exemplary embodiment, the present disclosure further provides a voice interaction apparatus, which is applied to an audio device, where the audio device is communicatively connected to the central control module. As shown in fig. 6, the apparatus in this embodiment includes: a first receiving module 110, a first determining module 120 and a voice interaction module 130. The apparatus of the present embodiment is used to implement the method as shown in fig. 2. The first receiving module 110 is configured to receive a current scene identifier sent by the central control module, where the current scene identifier is: the central control module is determined according to the current trigger information of the trigger information, and the current trigger information is used for representing: and monitoring the environmental information which is collected by the equipment and reaches the triggering threshold value. The first determining module 120 is configured to determine a voice response policy according to the current scene identifier. The voice interaction module 130 is configured to initiate a voice interaction according to the voice response policy. In this embodiment, the first determining module 120 is configured to: acquiring first configuration information, wherein the first configuration information comprises a corresponding relation between an application scene identifier and a voice response strategy; and determining a corresponding voice response strategy in the first configuration information according to the current scene identifier.

In one exemplary embodiment, the voice response policy includes: a plurality of interactive contents associated with the current scene identification. Still referring to fig. 6, the apparatus in this embodiment is used to implement the method shown in fig. 3. Wherein the voice interaction module 130 is configured to: determining and playing initial interactive content according to the current scene identifier; receiving a feedback response of a user; and determining and playing the target interactive content corresponding to the keywords according to the keywords in the feedback response.

The apparatus in this embodiment further comprises: the first control module is used for responding to a trigger voice termination condition and stopping voice interaction; wherein the voice termination condition comprises: the time length for receiving the feedback response exceeds the preset time length, or the feedback response comprises a preset termination keyword, or a termination instruction is received. The voice interaction module is further configured to: in response to the feedback response, including: the control instruction and the intelligent equipment keywords send the control instruction to the central control module, and target interactive content is played; wherein, the target interactive content comprises: a reminder that the control instruction has been executed.

In an exemplary embodiment, the present disclosure further provides a voice interaction apparatus, which is applied to a central control module, where the central control module is in communication connection with an audio device and a monitoring device in a current environment, where the monitoring device includes a sensor and/or an image capturing device disposed in the current environment. As shown in fig. 7, the apparatus of the present embodiment includes: a second receiving module 210, a second determining module 220, and a transmitting module 230. The apparatus of the present embodiment is used to implement the method as shown in fig. 4. The second receiving module 210 is configured to receive current trigger information sent by the monitoring device, where the current trigger information is used to characterize: monitoring environmental information which is collected by equipment and reaches a trigger threshold value; the second determining module 220 is configured to determine a corresponding current scene identifier according to the current trigger information; the sending module 230 is configured to send the current scene identifier to the audio device.

In this embodiment, the second determining module 220 is configured to: calling second configuration information, wherein the second configuration information comprises a corresponding relation between the trigger information and the application scene identifier; and determining the corresponding current scene identifier in the second configuration information according to the current trigger information. The device still includes: the second control module is used for receiving a control instruction sent by the audio equipment; and controlling the corresponding intelligent equipment to operate according to the control instruction.

Fig. 8 is a block diagram of an electronic device. The present disclosure also provides for an electronic device, for example, the device 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Device 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operation at the device 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power component 506 provides power to the various components of device 500. The power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 500.

The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the device 500 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the device 500, the sensor assembly 514 may also detect a change in the position of the device 500 or a component of the device 500, the presence or absence of user contact with the device 500, orientation or acceleration/deceleration of the device 500, and a change in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communications between the device 500 and other devices in a wired or wireless manner. The device 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

A non-transitory computer readable storage medium, such as the memory 504 including instructions executable by the processor 520 of the device 500 to perform the method, is provided in another exemplary embodiment of the present disclosure. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The instructions in the storage medium, when executed by a processor of the electronic device, enable the electronic device to perform the above-described method.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A voice interaction method is applied to an audio device, the audio device is in communication connection with a central control module, and the method comprises the following steps:

and initiating voice interaction according to the voice response strategy.

2. The method of claim 1, wherein the determining a voice response policy according to the current scene identifier comprises:

3. The voice interaction method of claim 1, wherein the voice response policy comprises: a plurality of interactive contents associated with the current scene identification; the initiating voice interaction according to the voice response strategy comprises:

receiving a feedback response of a user;

4. The voice interaction method of claim 3, further comprising:

5. The voice interaction method according to claim 3, wherein the determining and playing the target interactive content corresponding to the keyword according to the keyword in the feedback response comprises:

6. A voice interaction method is characterized by being applied to a central control module, wherein the central control module is in communication connection with audio equipment and monitoring equipment in the current environment, and the monitoring equipment comprises a sensor and/or image acquisition equipment arranged in the current environment; the method comprises the following steps:

and sending the current scene identification to the audio equipment.

7. The method of claim 6, wherein the determining the corresponding current scene identifier according to the current trigger information comprises:

8. The voice interaction method of claim 6, further comprising:

receiving a control instruction sent by the audio equipment;

9. A voice interaction device is applied to audio equipment which is in communication connection with a central control module, and the device comprises:

10. The apparatus of claim 9, wherein the first determining module is configured to:

11. The voice interaction apparatus of claim 9, wherein the voice response policy comprises: a plurality of interactive contents associated with the current scene identification; the voice interaction module is used for:

receiving a feedback response of a user;

12. The voice interaction device of claim 11, wherein the device further comprises: the first control module is used for responding to a trigger voice termination condition and stopping voice interaction;

13. The voice interaction apparatus of claim 11, wherein the voice interaction module is further configured to:

14. A voice interaction device is characterized by being applied to a central control module, wherein the central control module is in communication connection with audio equipment and monitoring equipment in the current environment, and the monitoring equipment comprises a sensor and/or image acquisition equipment arranged in the current environment; the device comprises:

15. The apparatus of claim 14, wherein the second determining module is configured to:

16. The voice interaction device of claim 14, wherein the device further comprises: a second control module for

Receiving a control instruction sent by the audio equipment;

17. An electronic device, comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the voice interaction method of any one of claims 1 to 5 or 6 to 8.

18. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the voice interaction method of any of claims 1 to 5 or 6 to 8.