CN113539250A

CN113539250A - Interaction method, device, system, voice interaction equipment, control equipment and medium

Info

Publication number: CN113539250A
Application number: CN202010297149.6A
Authority: CN
Inventors: 刘兆健; 葛佩; 汪贇; 李岳冰
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2021-10-22

Abstract

The embodiment of the application provides an interaction method, an interaction device, an interaction system, voice interaction equipment, control equipment, electronic equipment and a computer readable medium, and relates to the technical field of communication. Wherein the system comprises: the voice interaction device is used for detecting the operation of the control device used for interacting with the voice interaction device and sending corresponding corpus texts to the voice interaction device based on the operation; the voice interaction device is used for receiving the corpus text sent by the control device and executing a corresponding operation instruction based on the corpus text. By the embodiment of the application, the interaction with the voice interaction equipment can be conveniently carried out under the condition that the voice interaction with the voice interaction equipment is not suitable.

Description

Interaction method, device, system, voice interaction equipment, control equipment and medium

Technical Field

The embodiment of the application relates to the technical field of communication, in particular to an interaction method, an interaction device, an interaction system, a voice interaction device, a control device, an electronic device and a computer readable medium.

Background

With the development of terminal technology, voice interaction devices such as smart speakers have become increasingly popular. The voice interaction equipment can collect voice data of the user, and then provides service for the user according to the collected voice data, so that automation and intellectualization of life of the user are realized. For example, a user interacts with a smart speaker through voice, the user says "i want to listen to a children's story", the smart speaker collects voice data of the user through a microphone, recognizes the voice data into text using an automatic voice recognition technique, and then recognizes the user's intention and executes corresponding instructions according to the text using a natural language processing technique. However, in some application scenarios of a voice interaction device, a user is not adapted to interact with the voice interaction device by voice. For example, a user may not be able to speak for a certain period of time due to vocal cords being injured, but may wish to have access to a voice interactive device. As another example, a child may not pronounce the standard, or the user may simply speak a dialect, which the voice interaction device cannot recognize. Also for example, in a noisy environment, the accuracy of speech recognition is low. Both of these application scenarios may cause a degradation of the user's voice interaction experience with the voice interaction device. Therefore, under the condition that the voice interaction with the voice interaction equipment is not suitable, how to conveniently interact with the voice interaction equipment becomes a technical problem to be solved at present.

Disclosure of Invention

The application aims to provide an interaction method, an interaction device, an interaction system, a voice interaction device, a control device, an electronic device and a computer readable medium, which are used for solving the technical problem of how to conveniently interact with the voice interaction device under the condition that the voice interaction with the voice interaction device is not suitable in the prior art.

According to a first aspect of embodiments of the present application, an interaction method is provided. The method comprises the following steps: detecting an operation on a control device used for interacting with the voice interaction device, wherein the operation is used for instructing the control device to send a corresponding corpus text to the voice interaction device; and sending a corresponding corpus text to the voice interaction equipment based on the operation, so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text.

According to a second aspect of embodiments of the present application, there is provided an interaction apparatus. The device comprises: the detection module is used for detecting operation on control equipment used for interacting with the voice interaction equipment, and the operation is used for instructing the control equipment to send corresponding corpus texts to the voice interaction equipment; and the sending module is used for sending the corresponding corpus text to the voice interaction equipment based on the operation so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text.

According to a third aspect of embodiments of the present application, there is provided an interactive system. The system comprises: the voice interaction device is used for detecting the operation of the control device used for interacting with the voice interaction device and sending corresponding corpus texts to the voice interaction device based on the operation; the voice interaction device is used for receiving the corpus text sent by the control device and executing a corresponding operation instruction based on the corpus text.

According to a fourth aspect of embodiments of the present application, there is provided an interactive system. The system comprises: the intelligent sweeper comprises a sweeper with a voice interaction function and a button panel in communication connection with the sweeper, wherein the button panel is used for detecting operation on the button panel used for interacting with the sweeper and sending a corpus text for sweeping in a kitchen to the sweeper based on the operation; the sweeper is used for receiving the corpus text sent by the button panel and sweeping the floor in the kitchen based on the corpus text.

According to a fifth aspect of embodiments of the present application, there is provided an interactive system. The system comprises: the control device is used for detecting the operation of the control device used for interacting with the voice interaction device, determining an operation time point corresponding to the operation, and sending a corresponding corpus text to the voice interaction device based on a customized time period of the operation time point; the voice interaction device is used for receiving the corpus text sent by the control device and executing a corresponding operation instruction based on the corpus text.

According to a sixth aspect of embodiments of the present application, there is provided a voice interaction device. The apparatus comprises: the microphone, the loudspeaker and the short-range wireless communication device are connected with the controller through the circuit board; the microphone collects voice data of a user and sends the voice data to the controller, and the controller forwards the voice data to a cloud end, so that the cloud end executes a corresponding voice instruction according to the voice data and controls the loudspeaker to play an execution result of the voice instruction; the near field communication device is used for receiving a corpus text sent by an operation device interacting with voice interaction equipment based on the operation of the operation device and sending the corpus text to the controller, and the controller forwards the corpus text to the cloud end, so that the cloud end executes a corresponding operation instruction according to the corpus text and controls the loudspeaker to play an execution result of the operation instruction.

According to a seventh aspect of embodiments of the present application, a voice interaction device is provided. The apparatus comprises: the microphone, the loudspeaker and the button are connected with the controller through the circuit board; the microphone collects voice data of a user and sends the voice data to the controller, and the controller forwards the voice data to a cloud end, so that the cloud end executes a corresponding voice instruction according to the voice data and controls the loudspeaker to play an execution result of the voice instruction; the button is used for detecting the operation of the button and sending corresponding corpus texts to the controller based on the operation, and the controller forwards the corpus texts to the cloud end, so that the cloud end executes corresponding operation instructions according to the corpus texts and controls the loudspeaker to play execution results of the operation instructions.

According to an eighth aspect of embodiments of the present application, there is provided a manipulation apparatus. The apparatus comprises: the controller is used for sending corresponding corpus texts to voice interaction equipment when the fact that the key is pressed is detected, and therefore the voice interaction equipment executes corresponding operation instructions based on the corpus texts.

According to a ninth aspect of embodiments of the present application, there is provided a manipulation apparatus. The apparatus comprises: a sensor for detecting an operation for the manipulation device; and the controller is in communication connection with the sensor and is used for sending corresponding corpus texts to the voice interaction equipment based on the operation when the sensor detects the operation aiming at the control equipment, so that the voice interaction equipment executes corresponding operation instructions based on the corpus texts.

According to a tenth aspect of embodiments of the present application, there is provided a manipulation apparatus. The apparatus comprises: the input device is used for determining the corpus text input by the user based on the corpus text input operation of the user and sending the corpus text input by the user to the controller in communication connection with the input device; the controller is used for sending the corpus text input by the user to the voice interaction equipment, so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text input by the user.

According to an eleventh aspect of embodiments of the present application, there is provided an electronic apparatus including: one or more processors; a computer readable medium configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the interaction method as described in the first aspect of the embodiments above.

According to a twelfth aspect of embodiments of the present application, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the interaction method as described in the first aspect of the embodiments above.

According to the interaction scheme provided by the embodiment of the application, the operation of the operation device for interacting with the voice interaction device is detected by the operation device in communication connection with the voice interaction device, and the corresponding corpus text is sent to the voice interaction device based on the operation; compared with the existing other modes, the voice interaction device is in communication connection with the voice interaction device and sends the corresponding corpus text to the voice interaction device based on the operation of a user on the control device, and the voice interaction device executes the corresponding operation instruction based on the received corpus text, so that the voice interaction device can conveniently interact with the voice interaction device under the condition that the voice interaction device is not suitable for voice interaction.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1A is a schematic structural diagram of an interactive system in an embodiment of the present application;

FIG. 1B is a schematic diagram of an interaction process provided according to an embodiment of the present application;

fig. 2A is a schematic structural diagram of an interactive system in the second embodiment of the present application;

FIG. 2B is a diagram illustrating an interaction process provided in accordance with an embodiment II of the present application;

fig. 3 is a schematic structural diagram of an interactive system in the third embodiment of the present application;

fig. 4A is a schematic structural diagram of a voice interaction device in the fourth embodiment of the present application;

FIG. 4B is a diagram illustrating an interaction process according to the fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of a voice interaction device in the fifth embodiment of the present application;

fig. 6A is a schematic structural diagram of a control device in a sixth embodiment of the present application;

FIG. 6B is a diagram illustrating an interaction process according to a sixth embodiment of the present application;

fig. 7A is a schematic structural diagram of a control device in a seventh embodiment of the present application;

fig. 7B is a schematic diagram of an interaction process provided according to a seventh embodiment of the present application;

fig. 8A is a schematic structural diagram of an operating device in an eighth embodiment of the present application;

fig. 8B is a schematic diagram of an interaction process according to an eighth embodiment of the present application;

FIG. 9 is a flowchart illustrating steps of an interaction method according to a ninth embodiment of the present application;

fig. 10 is a schematic structural diagram of an interaction device in a tenth embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device in an eleventh embodiment of the application;

fig. 12 is a hardware structure of an electronic device according to a twelfth embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1A, a schematic structural diagram of an interactive system according to a first embodiment of the present application is shown. Specifically, the interactive system provided in this embodiment includes: the voice interaction device 10 and the control device 20 in communication connection with the voice interaction device 10, where the control device 20 is configured to detect an operation on the control device 20 for interacting with the voice interaction device and send a corresponding corpus text to the voice interaction device 10 based on the operation; the voice interaction device 10 is configured to receive the corpus text sent by the control device 20, and execute a corresponding operation instruction based on the corpus text.

In the embodiment of the present application, the voice interaction device 10 includes at least one of the following: the system comprises a sound box with a voice interaction function, a television with the voice interaction function and a mobile phone terminal with the voice interaction function. The voice interaction function can be understood as a function that a user interacts with the device through voice. The control device 20 can be understood as a device that provides a user with a control access to the voice interaction device. The manipulation device 20 may be a button panel, a child's toy, a distributed button, a tablet, etc. The manipulation device 20 interacts with the voice interaction device 10 by at least one of the following communication means: near field communication mode, bluetooth, zigbee, wireless local area network. The operation for the manipulation device 20 may be a pressing operation for a key of the manipulation device 20, a sending operation for a corpus text input by a user at the manipulation device 20, or the like. The corpus text may be understood as text for characterizing a user's intention, such as "play music", "listen to children's stories", and the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, when the voice interaction device 10 executes a corresponding operation instruction based on the corpus text, the voice interaction device 10 performs semantic analysis on the corpus text to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction. For a machine, a text is also the text itself, the meaning expressed by the text needs to be determined, and then the natural meaning corresponding to the text needs to be determined through semantic recognition, so that the content of the corpus text can be recognized. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, the control device includes a first button arranged on the panel for sending the corpus text and a second button for setting the corpus text corresponding to the first button, and the second button is used for setting the corpus text corresponding to the first button based on a pressing duration of the second button by a user. Therefore, the corpus text corresponding to the first button can be set based on the pressing time length of the second button by the user through the second button for setting the corpus text corresponding to the first button. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, when the pressing time of the user for the second button reaches 100ms, the corpus text corresponding to the first button can be set as "play music" through the second button. When the pressing time of the user for the second button reaches 500ms, the corpus text corresponding to the first button can be set as the 'children story listening' through the second button. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, the control device includes a first button arranged on the panel for sending the corpus text and a second button for setting the corpus text corresponding to the first button, and the second button is configured to set the corpus text corresponding to the first button based on the number of times that the user presses the second button continuously. Thereby, the corpus text corresponding to the first button can be set based on the number of times of continuous pressing of the second button by the user through the second button for setting the corpus text corresponding to the first button. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, the continuous pressing of the second button by the user may be understood as pressing of the second button by the user, wherein the interval duration between two adjacent pressing of the second button by the user is less than the preset duration. And when the continuous pressing times of the user for the second button reach three times, the corpus text corresponding to the first button can be set as 'music playing' through the second button. When the number of times of continuous pressing of the user on the second button reaches two times, the corpus text corresponding to the first button can be set as a 'children story listening' through the second button. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In one particular example, the manipulation device 20 comprises a child toy. The child toy is used for detecting the operation of the child toy through the nine-axis sensor of the child toy, and sending corresponding corpus texts to the voice interaction device 10 based on the operation. Wherein the operation on the child toy comprises at least one of: an upward waving operation of the child toy, a downward waving operation of the child toy, a leftward waving operation of the child toy, a rightward waving operation of the child toy, a circle drawing operation of the child toy. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, when the operation for the child toy is detected as the upward-waving operation of the child toy through the nine-axis sensor of the child toy, the corresponding corpus text "play a children's song" is transmitted to the voice interaction device 10 based on the upward-waving operation of the child toy. When it is detected through the nine-axis sensor of the child toy that the manipulation for the child toy is a downward waving manipulation for the child toy, a corresponding corpus text "play children's story" is transmitted to the voice interaction device 10 based on the downward waving manipulation for the child toy. When detecting that the operation for the child toy is a left waving operation for the child toy through the nine-axis sensor of the child toy, a corresponding corpus text "play child poem" is transmitted to the voice interaction device 10 based on the left waving operation for the child toy. When detecting that the operation to the child toy is the right waving operation to the child toy through the nine-axis sensor of the child toy, based on the right waving operation to the child toy, the corresponding corpus text "play child piano music" is transmitted to the voice interaction device 10. When it is detected through the nine-axis sensor of the child toy that the operation for the child toy is the operation of drawing a circle on the child toy, a corresponding corpus text "play child light music" is sent to the voice interaction device 10 based on the operation of drawing a circle on the child toy. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, the manipulation device 20 includes a plurality of distributed buttons located in different areas, and the distributed buttons are used for detecting operations on the distributed buttons for interacting with the voice interaction device 10 and sending corresponding corpus texts to the voice interaction device 10 based on the operations. Wherein the operation on the distributed button may be a pressing operation on the distributed button. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, the plurality of distributed buttons located in different areas may include a distributed button located in a living room, a distributed button located in a primary bed, a distributed button located in a secondary bed, and the like. When the distributed button located in the living room detects a pressing operation of the user for the distributed button located in the living room, the distributed button located in the living room sends a corresponding corpus text "turn on the light of the living room" to the voice interaction device 10 based on the pressing operation. When the distributed button located in the main-lying position detects a pressing operation of the user on the distributed button located in the main-lying position, the distributed button located in the main-lying position sends a corresponding corpus text "closing the main-lying window curtain" to the voice interaction device 10 based on the pressing operation. When the distributed button located on the second lying position detects a pressing operation of the user for the distributed button located on the second lying position, the distributed button located on the second lying position sends a corresponding corpus text "open intelligent mirror" to the voice interaction device 10 based on the pressing operation. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In some optional embodiments, the system further comprises: and the terminal equipment is in communication connection with the distributed buttons and is used for respectively setting corresponding corpus texts for the distributed buttons through a client installed on the terminal equipment. Therefore, through the client of the terminal device connected with the distributed buttons in a communication mode, corresponding corpus texts can be set for the distributed buttons respectively. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, when the distributed button is a distributed button located in a living room, the client installed in the terminal device may set a corresponding corpus text "turn on a light in the living room" for the distributed button located in the living room. When the distributed button is the distributed button positioned in the main-lying position, the corresponding corpus text 'closing the main-lying curtain' is set for the distributed button positioned in the main-lying position through the client installed on the terminal equipment. When the distributed button is a distributed button located in the second lying position, a corresponding corpus text 'intelligent mirror' is set for the distributed button located in the second lying position through a client installed on the terminal device. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In some optional embodiments, the control device 20 includes a handwriting pad, and the handwriting pad is configured to detect a corpus text input operation on the handwriting pad to generate an input corpus text, and detect a sending operation on the input corpus text after generating the input corpus text, so as to send the input corpus text to the voice interaction device. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, when the handwriting board detects a corpus text input operation on the handwriting board, the input corpus text is generated based on the corpus text input operation. After generating the input corpus text, the handwriting pad detects whether a user performs a sending operation on the input corpus text. And if the handwriting board detects the sending operation of the input corpus text, the handwriting board sends the input corpus text to the voice interaction equipment. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, as shown in fig. 1B, the voice interaction device 10 is a sound box, and the manipulation device 20 is a button panel. The button panel is in communication connection with the loudspeaker box through a wireless local area network or Bluetooth. Three buttons in the button panel are customized by the user to set the corpus text. For example, the corpus text corresponding to the button 1 may be set to "children's story", the corpus text corresponding to the button 2 may be set to "strike", and the corpus text corresponding to the button 3 may be set to "light off". The user does not need to interact with the voice of the voice equipment, and only needs to press the corresponding button, so that the sound box can conveniently execute the corresponding operation instruction. Specifically, when a user presses a button 1 of the button panel, the button panel sends a corpus text "children story" corresponding to the button 1 to the sound box, the sound box transparently transmits the received corpus text "children story" to a cloud end in communication connection with the sound box, and the cloud end performs semantic analysis on the corpus text "children story" to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction (plays the children story). When a user presses a button 2 of the button panel, the button panel sends a corpus text 'time telling' corresponding to the button 2 to the sound box, the sound box transmits the received corpus text 'time telling' to a cloud end in communication connection with the sound box, the cloud end carries out semantic analysis on the corpus text 'time telling' so as to obtain a semantic analysis result of the corpus text, an operation instruction matched with the semantic analysis result is matched, and then the matched operation instruction (time telling) is executed. When a user presses a button 3 of the button panel, the button panel sends a corpus text 'off light' corresponding to the button 3 to the sound box, the sound box transmits the received corpus text 'off light' to a cloud end in communication connection with the sound box, the cloud end performs semantic analysis on the corpus text 'off light' to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction (off light). It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

The interaction scheme provided by the embodiment expands the interaction mode of the voice interaction device, so that the control device sends the intention of the user to the voice interaction device in a text mode, and the voice interaction device performs natural language processing on the text and executes a corresponding instruction. The method is more universal, different control devices can customize corpus texts, and the voice interaction device does not need to make any additional adaptation.

Referring to fig. 2A, a schematic structural diagram of an interactive system according to a second embodiment of the present application is shown. Specifically, the interactive system provided in this embodiment includes: the intelligent floor sweeping machine comprises a floor sweeping machine 30 with a voice interaction function and a button panel 40 in communication connection with the floor sweeping machine 30, wherein the button panel 40 is used for detecting operation on the button panel 40 used for interacting with the floor sweeping machine 30 and sending a corpus text for sweeping in a kitchen to the floor sweeping machine 30 based on the operation; the sweeper 30 is configured to receive the corpus text sent by the button panel 40, and sweep the floor in the kitchen based on the corpus text. Therefore, the voice interaction with the sweeper can be conveniently carried out under the condition that the voice interaction with the sweeper is not suitable. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In the embodiment of the present application, the voice interaction function may be understood as a function in which a user interacts with a device through voice. The button panel 40 interacts with the sweeper 30 through at least one of the following communication means: near field communication mode, bluetooth, zigbee, wireless local area network. The operation for the button panel 40 may be a pressing operation for a button of the button panel 20. The corpus text may be understood as a text for characterizing the user's intention, for example, "sweeping the floor in the kitchen", etc. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, when the sweeper 30 sweeps the kitchen based on the corpus text, the sweeper 30 performs semantic analysis on the corpus text to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction. For a machine, a text is also the text itself, the meaning expressed by the text needs to be determined, and then the natural meaning corresponding to the text needs to be determined through semantic recognition, so that the content of the corpus text can be recognized. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, as shown in fig. 2B, the button panel is communicatively connected to the sweeper via a wireless lan or bluetooth. Three buttons in the button panel are customized by the user to set the corpus text. For example, the corpus text corresponding to the button 1 may be set to "sweep in the kitchen", the corpus text corresponding to the button 2 may be set to "sweep in the living room", and the corpus text corresponding to the button 3 may be set to "sweep in the bedroom". The user does not need to carry out voice with the sweeper with the voice interaction function, and only needs to press the corresponding button, so that the sweeper can conveniently execute the corresponding operation instruction. Specifically, when a user presses a button 1 of the button panel, the button panel sends a corpus text "sweeping in the kitchen" corresponding to the button 1 to the sweeper, the sweeper receives the corpus text "sweeping in the kitchen" through the short-distance wireless communication device and forwards the received corpus text "sweeping in the kitchen" to a cloud end in communication connection with the sweeper, and the cloud end performs semantic analysis on the corpus text "sweeping in the kitchen" to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction (sweeping in the kitchen). When a user presses a button 2 of a button panel, the button panel sends a language material text 'sweeping in a living room' corresponding to the button 2 to a sweeper, the sweeper receives the language material text 'sweeping in the living room' through a short-distance wireless communication device and forwards the received language material text 'sweeping in the living room' to a cloud end in communication connection with the sweeper, the cloud end conducts semantic analysis on the language material text 'sweeping in the living room' to obtain a semantic analysis result of the language material text, an operation instruction matched with the semantic analysis result is matched, and the matched operation instruction (sweeping in the living room) is executed. When a user presses a button 3 of the button panel, the button panel sends a corpus text 'sweeping in a bedroom' corresponding to the button 3 to the sweeper, the sweeper receives the corpus text 'sweeping in the bedroom' through the short-distance wireless communication device and forwards the received corpus text 'sweeping in the bedroom' to a cloud end in communication connection with the sweeper, the cloud end conducts semantic analysis on the corpus text 'sweeping in the bedroom' to obtain a semantic analysis result of the corpus text, an operation instruction matched with the semantic analysis result is matched, and then the matched operation instruction (sweeping in the bedroom) is executed. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

Referring to fig. 3, a schematic structural diagram of an interactive system according to a third embodiment of the present application is shown. Specifically, the interactive system provided in this embodiment includes: the voice interaction device 10 and the control device 20 are in communication connection with the voice interaction device 10, and the control device 20 is configured to detect an operation on the control device for interacting with the voice interaction device, determine an operation time point corresponding to the operation, and send a corresponding corpus text to the voice interaction device based on a customized time period in which the operation time point is located; the voice interaction device 20 is configured to receive the corpus text sent by the control device, and execute a corresponding operation instruction based on the corpus text.

In a specific example, before the control device 20 is put into use, the corpus text corresponding to the control device 20 in the customized time period is configured in advance. For example, when the control device 20 is a button panel communicatively connected to a sweeper, the corpus text corresponding to the control device 20 at the customized time period from 9 to 11 points may be set as "sweeping the sweeper in the kitchen". When the control device 20 is a button panel communicatively connected to the sweeper, the corpus text corresponding to the control device 20 from 14 to 17 points in the customized time period may be set as "sweeping in the living room". When the control device 20 is a button panel in communication connection with the sweeper, the corpus text corresponding to the control device 20 from 19 to 20 points in the customized time period may be set as "sweeping the floor in the bedroom". After the corpus text corresponding to the customized time period of the control device 20 is configured in advance, the control device 20 detects an operation on the control device for interacting with the voice interaction device, and determines an operation time point corresponding to the operation. If the customized time period of the operation time point is from 9 to 11, the control device 20 sends the corresponding corpus text "sweeping the floor in the kitchen" to the voice interaction device 10. If the customized time period of the operation time point is 14 to 17, the control device 20 sends the corresponding corpus text "sweeping the floor in the living room" to the voice interaction device 10. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

According to the interaction scheme provided by the embodiment of the application, the operation device in communication connection with the voice interaction device detects the operation on the operation device used for interacting with the voice interaction device, determines an operation time point corresponding to the operation, and sends a corresponding corpus text to the voice interaction device based on the customized time period of the operation time point; compared with the existing other modes, the voice interaction device is in communication connection with the voice interaction device and sends the corresponding corpus text to the voice interaction device based on the customized time period of the operation time point, and the voice interaction device executes the corresponding operation instruction based on the received corpus text and can conveniently interact with the voice interaction device under the condition that the voice interaction device is not suitable for voice interaction.

Referring to fig. 4A, a schematic structural diagram of a voice interaction device according to a fourth embodiment of the present application is shown. Specifically, the voice interaction device 10 provided in this embodiment includes: the device comprises a microphone 11, a loudspeaker 12, a circuit board 13, a controller 14 and a short-range wireless communication device 15, wherein the controller 14 and the short-range wireless communication device 15 are arranged on the circuit board 13, and the microphone 11, the loudspeaker 12 and the short-range wireless communication device 15 are connected with the controller 14 through the circuit board 13; the microphone 11 collects voice data of a user and sends the voice data to the controller 14, and the controller 14 forwards the voice data to the cloud 30, so that the cloud 30 executes a corresponding voice instruction according to the voice data and controls the loudspeaker 12 to play an execution result of the voice instruction; the short-distance wireless communication device 15 is configured to receive a corpus text sent by the control device 20 interacting with the voice interaction device 10 based on an operation on the control device 20 and send the corpus text to the controller 14, and the controller 14 forwards the corpus text to the cloud 30, so that the cloud 30 executes a corresponding operation instruction according to the corpus text and controls the speaker 12 to play an execution result of the operation instruction. Therefore, the short-distance wireless communication device receives the corpus text sent by the control equipment interacting with the voice interaction equipment based on the operation on the control equipment and executes the corresponding operation instruction according to the corpus text, so that the voice interaction mode of the voice interaction equipment is supplemented. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In one specific example, the close range wireless communication device comprises at least one of: bluetooth device, zigbee device, near field communication device, wireless local area network device. The controller may be a voice chip or a processor with voice signal and data processing capability and control function, and may specifically be a central processing unit, and may also be other general processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, as shown in fig. 4B, the voice interaction device 10 is a robot, and the manipulation device 20 is a button panel. The button panel is in communication connection with the loudspeaker box through a wireless local area network or Bluetooth. Three buttons in the button panel are customized by the user to set the corpus text. For example, the corpus text corresponding to the button 1 may be set to "play music", the corpus text corresponding to the button 2 may be set to "play poem", and the corpus text corresponding to the button 3 may be set to "play novel". The user does not need to carry out voice with the voice interaction equipment, and only needs to press a corresponding button, so that the robot can conveniently execute a corresponding operation instruction. Specifically, when a user presses a button 1 of the button panel, the button panel sends a corpus text 'playing music' corresponding to the button 1 to the robot, the robot receives the corpus text 'playing music' through the short-distance wireless communication device and forwards the received corpus text 'playing music' to a cloud end in communication connection with the robot, the cloud end performs semantic analysis on the corpus text 'playing music' to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction (playing music). When a user presses a button 2 of a button panel, the button panel sends a linguistic data text 'play poem' corresponding to the button 2 to a robot, the robot receives the linguistic data text 'play poem' through a near-distance wireless communication device and forwards the received linguistic data text 'play poem' to a cloud end in communication connection with the robot, the cloud end conducts semantic analysis on the linguistic data text 'play poem' to obtain a semantic analysis result of the linguistic data text, an operation instruction matched with the semantic analysis result is matched, and the matched operation instruction (play poem) is executed. When a user presses a button 3 of a button panel, the button panel sends a corpus text 'playing novel speech' corresponding to the button 3 to a robot, the robot receives the corpus text 'playing novel speech' through a short-distance wireless communication device and forwards the received corpus text 'playing novel speech' to a cloud end in communication connection with the robot, the cloud end conducts semantic analysis on the corpus text 'playing novel speech' to obtain a semantic analysis result of the corpus text, an operation instruction matched with the semantic analysis result is matched, and then the matched operation instruction (playing novel speech) is executed. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

Referring to fig. 5, a schematic structural diagram of a voice interaction device according to a fifth embodiment of the present application is shown. Specifically, the voice interaction device 10 provided in this embodiment includes: the microphone 11, the loudspeaker 12, the circuit board 13, and the controller 14 and the button 16 arranged on the circuit board 13, wherein the microphone 11, the loudspeaker 12 and the button 16 are connected with the controller 14 through the circuit board 13; the microphone 11 collects voice data of a user and sends the voice data to the controller 14, and the controller 14 forwards the voice data to the cloud 30, so that the cloud 30 executes a corresponding voice instruction according to the voice data and controls the loudspeaker 12 to play an execution result of the voice instruction; the button 16 is configured to detect an operation on the button 16, and send a corresponding corpus text to the controller 14 based on the operation, and the controller 14 forwards the corpus text to the cloud 30, so that the cloud 30 executes a corresponding operation instruction according to the corpus text, and controls the speaker 12 to play an execution result of the operation instruction. Therefore, the button is operated through button detection, the corresponding corpus text is sent to the controller based on the operation, the controller forwards the corpus text to the cloud, the cloud executes the corresponding operation instruction according to the corpus text, and therefore the voice interaction mode of the voice interaction device is supplemented. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, the controller may be a voice chip or a processor with voice signal and data processing capability and control function, and may be specifically a central processing unit, and may also be other general processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

Referring to fig. 6A, a schematic structural diagram of a manipulation device according to a sixth embodiment of the present application is shown. Specifically, the control device provided in this embodiment includes: the controller 22 is configured to send a corresponding corpus text to a voice interaction device when it is detected that the key 21 is pressed, so that the voice interaction device executes a corresponding operation instruction based on the corpus text. Therefore, the voice interaction equipment can be conveniently interacted with. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, the controller may be a chip or a processor with data processing capability and control function, specifically a central processing unit, and may also be other general processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, as shown in fig. 6B, the voice interaction device is a sound box, and the manipulation device is a button panel. The button panel is in communication connection with the loudspeaker box through a wireless local area network or Bluetooth. Three buttons in the button panel are customized by the user to set the corpus text. For example, the corpus text corresponding to the button 1 may be set to "open a curtain", the corpus text corresponding to the button 2 may be set to "turn on a desk lamp", and the corpus text corresponding to the button 3 may be set to "turn on a water heater". The user does not need to interact with the voice of the voice equipment, and only needs to press the corresponding button, so that the sound box can conveniently execute the corresponding operation instruction. Specifically, when a user presses a button 1 of the button panel, the button panel sends a corpus text 'open curtain' corresponding to the button 1 to the sound box, the sound box forwards the received corpus text 'open curtain' to a cloud end in communication connection with the sound box, the cloud end performs semantic analysis on the corpus text 'open curtain' to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction (open curtain). When a user presses a button 2 of the button panel, the button panel sends a corpus text 'turn-on desk lamp' corresponding to the button 2 to the sound box, the sound box forwards the received corpus text 'turn-on desk lamp' to a cloud end in communication connection with the sound box, the cloud end performs semantic analysis on the corpus text 'turn-on desk lamp' to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction (turn-on desk lamp). When a user presses a button 3 of the button panel, the button panel sends a corpus text 'open water heater' corresponding to the button 3 to the sound box, the sound box forwards the received corpus text 'open water heater' to a cloud end in communication connection with the sound box, the cloud end conducts semantic analysis on the corpus text 'open water heater' to obtain a semantic analysis result of the corpus text, an operation instruction matched with the semantic analysis result is matched, and then the matched operation instruction is executed (the water heater is opened). It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

Referring to fig. 7A, a schematic structural diagram of a seventh operating device according to an embodiment of the present application is shown. Specifically, the control device provided in this embodiment includes: a sensor 23 for detecting an operation for the manipulation device; and the controller 22 is in communication connection with the sensor 23, and is configured to send, based on an operation of the control device, a corresponding corpus text to the voice interaction device when the sensor detects the operation of the control device, so that the voice interaction device executes a corresponding operation instruction based on the corpus text. Therefore, the voice interaction equipment can be conveniently interacted with. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In one particular example, the sensor 22 may be a nine-axis sensor. The controller may be a chip or a processor with data processing capability and control function, and may specifically be a central processing unit, and may also be other general processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, as shown in fig. 7B, the voice interaction device is a sound box, and the manipulation device is a child toy. The children toy is in communication connection with the loudspeaker box through a wireless local area network or Bluetooth. Corpus text corresponding to the operation of the child toy may be preconfigured. For example, the corpus text corresponding to the upward-waving operation of the child toy may be set to "play child story", and the corpus text corresponding to the downward-waving operation of the child toy may be set to "play child poem". The children do not need to interact with the voice of the equipment, and only need to execute corresponding operation aiming at the children toys, the sound box can conveniently execute corresponding operation instructions. Specifically, when the children toy is waved upwards, the children toy sends a corresponding corpus text 'playing children story' to the sound box, the sound box forwards the received corpus text 'playing children story' to a cloud end in communication connection with the sound box, the cloud end performs semantic analysis on the corpus text 'playing children story' to obtain a semantic analysis result of the corpus text, matches an operation instruction matched with the semantic analysis result, and then executes the matched operation instruction (playing children story). When the children toys are waved downwards, the children toys send the played children poems to the sound box according to the corresponding linguistic data text, the sound box forwards the received linguistic data text, namely the played children poems, to the cloud end in communication connection with the sound box, the cloud end conducts semantic analysis on the linguistic data text, namely the played children poems, so that semantic analysis results of the linguistic data text are obtained, operation instructions matched with the semantic analysis results are matched, and the matched operation instructions (played children poems) are executed. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

Referring to fig. 8A, a schematic structural diagram of a manipulation device according to an eighth embodiment of the present application is shown. Specifically, the control device provided in this embodiment includes: the input device is used for determining the corpus text input by the user based on the corpus text input operation of the user and sending the corpus text input by the user to the controller in communication connection with the input device; the controller is used for sending the corpus text input by the user to the voice interaction equipment, so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text input by the user. Therefore, the voice interaction equipment can be conveniently interacted with. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In one specific example, the input device may be a touch display screen. The controller may be a chip or a processor with data processing capability and control function, and may specifically be a central processing unit, and may also be other general processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, as shown in fig. 8B, the voice interaction device is a sound box, and the control device is a handwriting board. The handwriting board is in communication connection with the loudspeaker box through a wireless local area network or Bluetooth. The voice of the voice interaction equipment is not needed by the children, and the sound box can conveniently execute corresponding operation instructions only by executing corresponding corpus text input operation aiming at the handwriting board. Specifically, when a user executes a corpus text input operation on the handwriting board, the handwriting board generates an input corpus text according to the corpus text input operation, the input corpus text is sent to the sound box, the sound box forwards the received corpus text to a cloud end in communication connection with the sound box, the cloud end performs semantic analysis on the corpus text to obtain a semantic analysis result of the corpus text, an operation instruction matched with the semantic analysis result is matched, and the matched operation instruction is executed. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

Referring to fig. 9, a flowchart illustrating steps of an interaction method according to a ninth embodiment of the present application is shown.

Specifically, the interaction method provided by this embodiment includes the following steps (the execution subject is the control device):

in step S901, an operation of a manipulation device for interacting with the voice interaction device is detected.

In an embodiment of the present application, the voice interaction device includes at least one of: the system comprises a sound box with a voice interaction function, a television with the voice interaction function and a mobile phone terminal with the voice interaction function. The voice interaction function can be understood as a function that a user interacts with the device through voice. The control device can be understood as a device that provides a user with a control entry of the voice interaction device. The manipulation device may be a button panel, a child's toy, a distributed button, a tablet, etc. The control device interacts with the voice interaction device through at least one of the following communication modes: near field communication mode, bluetooth, zigbee, wireless local area network. The operation is used for instructing the control device to send corresponding corpus texts to the voice interaction device. The operation for the control device may be a pressing operation for a key of the control device, a sending operation for a corpus text input by a user at the control device, or the like. The corpus text may be understood as text for characterizing a user's intention, such as "play music", "listen to children's stories", and the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In some optional embodiments, before the detecting operation of the manipulation device for interacting with the voice interaction device, the method further comprises: configuring corresponding corpus texts for keys of the control equipment; the detecting operation on a manipulation device for interacting with the voice interaction device includes: detecting operation of a key of the control device. Therefore, corresponding corpus texts can be configured for the keys of the control device. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In a specific example, if the manipulation device has two keys, one key may be configured with a corresponding corpus text "play children's story" and the other key may be configured with a corresponding corpus text "play children's music". It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In some optional embodiments, the manipulation device comprises a child toy, and the detecting operation of the manipulation device for interacting with the voice interaction device comprises: detecting, by a nine-axis sensor of the child toy, an operation of the child toy. Wherein the operation on the child toy comprises at least one of: an upward waving operation of the child toy, a downward waving operation of the child toy, a leftward waving operation of the child toy, a rightward waving operation of the child toy, a circle drawing operation of the child toy. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In some optional embodiments, the manipulation device comprises a plurality of distributed buttons located in different regions, and the method further comprises, before the detecting operation of the manipulation device for interacting with the voice interaction device: through installing in the customer end of terminal equipment, set up corresponding corpus text respectively for a plurality of distributed button, wherein, a plurality of distributed button respectively with customer end communication connection, detect the operation to be used for with the interactive control equipment of voice interaction equipment includes: detecting an operation of the distributed button. Therefore, the corresponding corpus texts can be respectively set for the distributed buttons by the client installed in the terminal equipment. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In some optional embodiments, the manipulation device comprises a writing pad. When detecting the operation of a control device for interacting with the voice interaction device, detecting the corpus text input operation of the handwriting board to generate an input corpus text; after the input corpus text is generated, the sending operation of the input corpus text is detected, so that the input corpus text is sent to the voice interaction equipment. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.

In step S902, based on the operation, a corresponding corpus text is sent to the voice interaction device, so that the voice interaction device executes a corresponding operation instruction based on the corpus text.

In practical application, a user can conveniently interact with the voice interaction device through the control device, and the control device does not need to be connected with a complex data protocol and only needs to send corresponding corpus texts. In addition, the data transmitted by the voice interaction equipment and the control equipment is simplified into a speech material text, and the interaction protocol of the control equipment and the voice interaction equipment is simpler and more universal.

According to the interaction method provided by the embodiment of the application, the operation of the control device used for interacting with the voice interaction device is detected, the operation is used for indicating the control device to send the corresponding corpus text to the voice interaction device, and based on the operation, the corresponding corpus text is sent to the voice interaction device, so that the voice interaction device executes the corresponding operation instruction based on the corpus text.

The interaction method provided by the present embodiment may be performed by any suitable device having data processing capabilities, including but not limited to: a camera, a terminal, a mobile terminal, a PC, a server, an in-vehicle device, an entertainment device, an advertising device, a Personal Digital Assistant (PDA), a tablet, a laptop, a handheld game machine, glasses, a watch, a wearable device, a virtual display device, a display enhancement device, or the like.

Referring to fig. 10, a schematic structural diagram of an interaction device in the tenth embodiment of the present application is shown.

The interaction device provided by the embodiment comprises: a detecting module 1003, configured to detect an operation on a control device configured to interact with the voice interaction device, where the operation is used to instruct the control device to send a corresponding corpus text to the voice interaction device; a sending module 1004, configured to send, based on the operation, a corresponding corpus text to the voice interaction device, so that the voice interaction device executes a corresponding operation instruction based on the corpus text.

Optionally, before the detecting module 1003, the apparatus further includes: a configuration module 1001, configured to configure a corresponding corpus text for a key of the control device; the detection module 1003 is specifically configured to: detecting operation of a key of the control device.

Optionally, the control device includes a toy for children, and the detection module 1003 is specifically configured to: detecting, by a nine-axis sensor of the child toy, an operation of the child toy.

Optionally, the operation of the child toy comprises at least one of: an upward waving operation of the child toy, a downward waving operation of the child toy, a leftward waving operation of the child toy, a rightward waving operation of the child toy, a circle drawing operation of the child toy.

Optionally, the manipulating device includes a plurality of distributed buttons located in different areas, and before the detecting module 1003, the apparatus further includes: a setting module 1002, configured to set, through a client installed in a terminal device, corresponding corpus texts for a plurality of distributed buttons respectively, where the plurality of distributed buttons are in communication connection with the client respectively, and the detecting module 1003 is specifically configured to: detecting an operation of the distributed button.

Optionally, the control device includes a tablet, and the detection module 1003 is specifically configured to: detecting a corpus text input operation on the handwriting board to generate an input corpus text; after the input corpus text is generated, the sending operation of the input corpus text is detected, so that the input corpus text is sent to the voice interaction equipment.

Optionally, the control device interacts with the voice interaction device through at least one of the following communication modes: near field communication mode, bluetooth, zigbee, wireless local area network.

Optionally, the voice interaction device comprises at least one of: the system comprises a sound box with a voice interaction function, a television with the voice interaction function and a mobile phone terminal with the voice interaction function.

The interaction device of this embodiment is used to implement the corresponding interaction method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.

Fig. 11 is a schematic structural diagram of an electronic device in an eleventh embodiment of the application; the electronic device may include:

one or more processors 1101;

a computer-readable medium 1102, which may be configured to store one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the interaction method according to the above-mentioned embodiment nine.

Fig. 12 is a hardware configuration of an electronic device according to a twelfth embodiment of the present application; as shown in fig. 12, the hardware structure of the electronic device may include: a processor 1201, a communication interface 1202, a computer readable medium 1203, and a communication bus 1204;

wherein the processor 1201, the communication interface 1202, and the computer readable medium 1203 are in communication with each other via a communication bus 1204;

alternatively, the communication interface 1202 may be an interface of a communication module, such as an interface of a GSM module;

the processor 1201 may be specifically configured to: detecting an operation on a control device used for interacting with the voice interaction device, wherein the operation is used for instructing the control device to send a corresponding corpus text to the voice interaction device; and sending a corresponding corpus text to the voice interaction equipment based on the operation, so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text.

The Processor 1201 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The computer-readable medium 1203 may be, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code configured to perform the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access storage media (RAM), a read-only storage media (ROM), an erasable programmable read-only storage media (EPROM or flash memory), an optical fiber, a portable compact disc read-only storage media (CD-ROM), an optical storage media piece, a magnetic storage media piece, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code configured to carry out operations for the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may operate over any of a variety of networks: including a Local Area Network (LAN) or a Wide Area Network (WAN) -to the user's computer, or alternatively, to an external computer (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions configured to implement the specified logical function(s). In the above embodiments, specific precedence relationships are provided, but these precedence relationships are only exemplary, and in particular implementations, the steps may be fewer, more, or the execution order may be modified. That is, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present application may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a detection module and a transmission module. The names of these modules do not in some cases constitute a limitation on the module itself, for example, the detection module may also be described as a "module that detects the operation of a manipulation device for interacting with the voice interaction device".

As another aspect, the present application further provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the interaction method described in the above embodiment nine.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: detecting an operation on a control device used for interacting with the voice interaction device, wherein the operation is used for instructing the control device to send a corresponding corpus text to the voice interaction device; and sending a corresponding corpus text to the voice interaction equipment based on the operation, so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text.

The expressions "first", "second", "said first" or "said second" used in various embodiments of the present disclosure may modify various components regardless of order and/or importance, but these expressions do not limit the respective components. The above description is only configured for the purpose of distinguishing elements from other elements. For example, the first user equipment and the second user equipment represent different user equipment, although both are user equipment. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure.

When an element (e.g., a first element) is referred to as being "operably or communicatively coupled" or "connected" (operably or communicatively) to "another element (e.g., a second element) or" connected "to another element (e.g., a second element), it is understood that the element is directly connected to the other element or the element is indirectly connected to the other element via yet another element (e.g., a third element). In contrast, it is understood that when an element (e.g., a first element) is referred to as being "directly connected" or "directly coupled" to another element (a second element), no element (e.g., a third element) is interposed therebetween.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. An interactive system, the system comprising:

the voice interaction device and the control device which is in communication connection with the voice interaction device,

the control device is used for detecting the operation of the control device used for interacting with the voice interaction device and sending corresponding corpus texts to the voice interaction device based on the operation;

the voice interaction device is used for receiving the corpus text sent by the control device and executing a corresponding operation instruction based on the corpus text.

2. The system of claim 1, wherein the manipulation device comprises a child toy,

the children toy is used for detecting the operation of the children toy through the nine-axis sensor of the children toy, and sending corresponding corpus texts to the voice interaction equipment based on the operation.

3. The system of claim 2, wherein the operation on the child toy comprises at least one of: an upward waving operation of the child toy, a downward waving operation of the child toy, a leftward waving operation of the child toy, a rightward waving operation of the child toy, a circle drawing operation of the child toy.

4. The system of claim 1, wherein the steering device comprises a plurality of distributed buttons located in different regions,

the distributed button is used for detecting the operation of the distributed button for interacting with the voice interaction equipment and sending corresponding corpus texts to the voice interaction equipment based on the operation.

5. The system of claim 4, wherein the system further comprises:

and the terminal equipment is in communication connection with the distributed buttons and is used for respectively setting corresponding corpus texts for the distributed buttons through a client installed on the terminal equipment.

6. The system of claim 1, wherein the manipulation device comprises a writing pad,

the handwriting board is used for detecting the operation of inputting the language material text of the handwriting board so as to generate the input language material text, and detecting the operation of sending the input language material text after the input language material text is generated so as to send the input language material text to the voice interaction equipment.

7. The system according to claim 1, wherein the manipulation device includes a first button for transmitting the corpus text and a second button for setting the corpus text corresponding to the first button, which are disposed on a panel,

and the second button is used for setting the corpus text corresponding to the first button based on the pressing duration of the second button by the user.

8. The system according to claim 1, wherein the manipulation device includes a first button for transmitting the corpus text and a second button for setting the corpus text corresponding to the first button, which are disposed on a panel,

and the second button is used for setting the corpus text corresponding to the first button based on the continuous pressing times of the user for the second button.

9. The system of any one of claims 1-8, wherein the steering device interacts with the voice interaction device through at least one of:

near field communication mode, bluetooth, zigbee, wireless local area network.

10. The system of any one of claims 1-8, wherein the voice interaction device comprises at least one of:

the system comprises a sound box with a voice interaction function, a television with the voice interaction function and a mobile phone terminal with the voice interaction function.

11. An interactive system, the system comprising:

a sweeper with a voice interaction function and a button panel in communication connection with the sweeper,

the button panel is used for detecting the operation of the button panel for interacting with the sweeper and sending the corpus text for sweeping in the kitchen to the sweeper based on the operation;

the sweeper is used for receiving the corpus text sent by the button panel and sweeping the floor in the kitchen based on the corpus text.

12. An interactive system, the system comprising:

the control device is used for detecting the operation of the control device for interacting with the voice interaction device, determining an operation time point corresponding to the operation, and sending a corresponding corpus text to the voice interaction device based on the customized time period of the operation time point;

13. A voice interaction device, the device comprising:

the microphone, the loudspeaker and the short-range wireless communication device are connected with the controller through the circuit board;

the microphone collects voice data of a user and sends the voice data to the controller, and the controller forwards the voice data to a cloud end, so that the cloud end executes a corresponding voice instruction according to the voice data and controls the loudspeaker to play an execution result of the voice instruction;

the near field communication device is used for receiving a corpus text sent by an operation device interacting with voice interaction equipment based on the operation of the operation device and sending the corpus text to the controller, and the controller forwards the corpus text to the cloud end, so that the cloud end executes a corresponding operation instruction according to the corpus text and controls the loudspeaker to play an execution result of the operation instruction.

14. The apparatus of claim 13, wherein the close-range wireless communication device comprises at least one of: bluetooth device, zigbee device, near field communication device, wireless local area network device.

15. A voice interaction device, the device comprising:

the microphone, the loudspeaker and the button are connected with the controller through the circuit board;

the button is used for detecting the operation of the button and sending corresponding corpus texts to the controller based on the operation, and the controller forwards the corpus texts to the cloud end, so that the cloud end executes corresponding operation instructions according to the corpus texts and controls the loudspeaker to play execution results of the operation instructions.

16. A steering device, the device comprising:

a key and a controller in communication connection with the key,

the controller is configured to send a corresponding corpus text to a voice interaction device when it is detected that the key is pressed, so that the voice interaction device executes a corresponding operation instruction based on the corpus text.

17. A steering device, the device comprising:

a sensor for detecting an operation for the manipulation device;

and the controller is in communication connection with the sensor and is used for sending corresponding corpus texts to the voice interaction equipment based on the operation when the sensor detects the operation aiming at the control equipment, so that the voice interaction equipment executes corresponding operation instructions based on the corpus texts.

18. A steering device, the device comprising:

the input device is used for determining the corpus text input by the user based on the corpus text input operation of the user and sending the corpus text input by the user to the controller in communication connection with the input device;

the controller is used for sending the corpus text input by the user to the voice interaction equipment, so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text input by the user.

19. A method of interaction, the method comprising:

detecting an operation on a control device used for interacting with the voice interaction device, wherein the operation is used for instructing the control device to send a corresponding corpus text to the voice interaction device;

and sending a corresponding corpus text to the voice interaction equipment based on the operation, so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text.

20. The method of claim 19, wherein prior to the detecting operation of a steering device for interacting with the voice interaction device, the method further comprises:

configuring corresponding corpus texts for keys of the control equipment;

the detecting operation on a manipulation device for interacting with the voice interaction device includes:

detecting operation of a key of the control device.

21. The method of claim 19, wherein the manipulation device comprises a child toy, and the detecting operation of the manipulation device for interacting with the voice interaction device comprises:

detecting, by a nine-axis sensor of the child toy, an operation of the child toy.

22. The method of claim 21, wherein the operation on the child toy comprises at least one of: an upward waving operation of the child toy, a downward waving operation of the child toy, a leftward waving operation of the child toy, a rightward waving operation of the child toy, a circle drawing operation of the child toy.

23. The method of claim 19, wherein the steering device comprises a plurality of distributed buttons located in different regions,

before the detecting an operation of a manipulation device for interacting with the voice interaction device, the method further comprises:

setting corresponding language material texts for a plurality of distributed buttons through a client installed on a terminal device, wherein the distributed buttons are respectively in communication connection with the client,

detecting an operation of the distributed button.

24. The method of claim 19, wherein the manipulation device comprises a writing pad,

detecting a corpus text input operation on the handwriting board to generate an input corpus text;

after the input corpus text is generated, the sending operation of the input corpus text is detected, so that the input corpus text is sent to the voice interaction equipment.

25. The method according to any one of claims 19-24, wherein the steering device interacts with the voice interaction device through at least one of the following communication means:

near field communication mode, bluetooth, zigbee, wireless local area network.

26. The method of any one of claims 19-24, wherein the voice interaction device comprises at least one of:

27. An interactive apparatus, the apparatus comprising:

the detection module is used for detecting operation on control equipment used for interacting with the voice interaction equipment, and the operation is used for instructing the control equipment to send corresponding corpus texts to the voice interaction equipment;

and the sending module is used for sending the corresponding corpus text to the voice interaction equipment based on the operation so that the voice interaction equipment executes a corresponding operation instruction based on the corpus text.

28. An electronic device, the device comprising:

one or more processors;

a computer readable medium configured to store one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the interaction method of any one of claims 19-26.

29. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the interaction method according to any one of claims 19 to 26.