CN109326289B

CN109326289B - Wake-up-free voice interaction method, device, equipment and storage medium

Info

Publication number: CN109326289B
Application number: CN201811464212.XA
Authority: CN
Inventors: 姚凯
Original assignee: Shenzhen Skyworth Digital Technology Co Ltd
Current assignee: Shenzhen Skyworth Digital Technology Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2021-10-22
Anticipated expiration: 2038-11-30
Also published as: CN109326289A

Abstract

The invention discloses a wake-up-free voice interaction method, a wake-up-free voice interaction device, a wake-up-free voice interaction equipment and a storage medium, wherein text information is obtained by performing voice recognition on a target audio signal of a current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.

Description

Wake-up-free voice interaction method, device, equipment and storage medium

Technical Field

The present invention relates to the field of voice recognition, and in particular, to a wake-up-free voice interaction method, apparatus, device, and storage medium.

Background

In the existing far-field intelligent voice product, a user wakes up first when using the far-field intelligent voice product, namely, the identification function of the product is started to perform subsequent interaction through a customized command word, for example, "small degree" is fed back by a device, and then the user continues to say "i want to watch a movie", wherein the "small degree" is just a wake-up word, and after the device receives an instruction and executes the instruction, the interaction is closed to wait for next wake-up.

Waking by using a wake-up word as an important starting step affects the experience of many far-field voice interactions, wherein the common problems are 'no wake-up' and 'false wake-up', the 'no wake-up' causes that subsequent commands cannot be continued, and the 'false wake-up' is a situation that when a user does not give an instruction, the device mistakenly considers that the instruction exists due to the interference of environmental sound.

Disclosure of Invention

The invention mainly aims to provide a wake-up-free voice interaction method, a wake-up-free voice interaction device, equipment and a storage medium, and aims to solve the technical problem that in the prior art, a voice product is woken up by a wake-up word easily to cause false wake-up or not wake-up, so that the user intention cannot be cleared and known, and the user voice interaction experience is poor.

In order to achieve the above object, the present invention provides a wake-up free voice interaction method, which includes the following steps:

carrying out voice recognition on a target audio signal of the current environment to obtain character information;

carrying out semantic recognition on the character information to obtain a complete intention vector;

obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene;

and acquiring an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.

Preferably, the semantic recognition of the text information to obtain a complete intention vector includes:

performing semantic recognition on the character information according to a preset intention word database to obtain a conversation object, a function field and an instruction verb in the character information;

and determining a complete intention vector according to the dialog object, the affiliated functional field and the instruction verb.

Preferably, the determining a complete intention vector according to the dialog object, the functional domain and the instruction verb includes:

acquiring a function field to which the history of the previous section of conversation adjacent to the current conversation belongs from a preset history voice database;

matching the function field to which the history belongs to generate a matching result;

and determining a complete intention vector according to the matching result, the dialog object, the affiliated functional field and the instruction verb.

Preferably, the determining a complete intention vector according to the matching result, the dialog object, the functional domain and the instruction verb includes:

when the matching result is that the affiliated function field is the same as the function field to which the history belongs, the dialog object, the affiliated function field and the instruction verb are used as complete intention vectors;

when the matching result is that the affiliated function field is different from the historical affiliated function field, judging whether the dialogue object, the affiliated function field and the instruction verb are vectors contained in a preset vector set;

and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors.

Preferably, the obtaining a current dialog scenario to which the current environment belongs, and filtering out irrelevant fields in the complete intention vector according to the current dialog scenario includes:

acquiring a current conversation scene to which a current environment belongs, and substituting the current conversation scene into a preset conversation field model to acquire a current conversation scene field set;

and matching the complete intention vector with the field set of the current dialog scene to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector.

Preferably, the obtaining an operation instruction according to the filtered complete intention vector, and performing function control on the target device according to the operation instruction to implement voice interaction includes:

substituting the filtered complete intention vectors into a preset instruction model to obtain a target instruction, wherein the target instruction is used as an operation instruction, and the preset instruction model is used for reflecting the mapping relation between each complete intention vector and the instruction;

and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.

Preferably, before performing speech recognition on the target audio signal of the current environment and obtaining text information, the wake-up-free speech interaction method further includes:

the method comprises the steps of receiving sound of the current environment through a microphone of the current equipment, and generating a target audio signal according to the sound of the current environment.

In addition, to achieve the above object, the present invention further provides a wake-up-free voice interaction device, where the wake-up-free voice interaction device includes: the system comprises a memory, a processor and a wake-free voice interaction program stored on the memory and capable of running on the processor, wherein the wake-free voice interaction program is configured to realize the steps of the wake-free voice interaction method.

In addition, in order to achieve the above object, the present invention further provides a storage medium, where a wake-free voice interaction program is stored, and when executed by a processor, the wake-free voice interaction program implements the steps of the wake-free voice interaction method as described above.

In addition, to achieve the above object, the present invention further provides a wake-up-free voice interaction apparatus, including: the system comprises an information acquisition module, a semantic recognition module, a filtering module and a voice interaction module;

the information acquisition module is used for carrying out voice recognition on a target audio signal of the current environment to acquire character information;

the semantic recognition module is used for carrying out semantic recognition on the character information to obtain a complete intention vector;

the filtering module is used for acquiring a current dialog scene to which a current environment belongs and filtering irrelevant fields in the complete intention vector according to the current dialog scene;

and the voice interaction module is used for acquiring an operation instruction according to the filtered complete intention vector and carrying out function control on the target equipment according to the operation instruction so as to realize voice interaction.

The awakening-free voice interaction method provided by the invention obtains the text information by carrying out voice recognition on the target audio signal of the current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.

Drawings

Fig. 1 is a schematic structural diagram of a wake-up-free voice interaction device in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a wake-up free voice interaction method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a wake-up free voice interaction method according to a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating a wake-up free voice interaction method according to a third embodiment of the present invention;

fig. 5 is a functional block diagram of a wake-up free voice interaction apparatus according to a first embodiment of the present invention.

The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The solution of the embodiment of the invention is mainly as follows: the method comprises the steps of carrying out voice recognition on a target audio signal of the current environment to obtain character information; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; according to the filtered complete intention vector, an operation instruction is obtained, according to the operation instruction performs function control on the target device, voice interaction is achieved, through judgment on the complete intention vector, influence of irrelevant conversation on the voice interaction can be avoided, a word does not need to be wakened up, a user can naturally talk with the voice device, starting time of the device is saved, experience of far-field voice interaction of the user is improved, the problem that mistaken awakening or awakening failure of awakening a voice product easily occurs due to the fact that the voice product is wakened up by the wakening word in the prior art is solved, user intention cannot be cleared and known, and technical problems of poor user voice interaction experience are caused.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a wake-up free voice interaction device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the wake-free voice interaction device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

It will be understood by those skilled in the art that the wake-free voice interaction device configuration shown in fig. 1 does not constitute a limitation of the wake-free voice interaction device, and may include more or less components than those shown, or combine some components, or a different arrangement of components.

As shown in fig. 1, a memory 1005 as a storage medium may include an operating device, a network communication module, a user interface module, and a wake-free voice interaction program.

Further, the processor 1001 may call the wake-free voice interaction program stored in the memory 1005, and further perform the following operations:

According to the scheme, the text information is obtained by performing voice recognition on the target audio signal of the current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.

Based on the hardware structure, the embodiment of the wake-up-free voice interaction method is provided.

Referring to fig. 2, fig. 2 is a flowchart illustrating a wake-up free voice interaction method according to a first embodiment of the present invention.

In a first embodiment, the wake-free voice interaction method includes the following steps:

and step S10, performing voice recognition on the target audio signal of the current environment to obtain character information.

It should be noted that, the target audio signal of the current environment is an audio signal collection mode corresponding to the sound collected by the current environment, the sound of the current environment can be collected by the microphone device and converted into a corresponding audio signal, and the text information can be obtained by performing speech recognition on the target audio signal.

Further, before the step S10, the wake-free voice interaction method further includes the following steps:

It can be understood that the microphone of the current device can receive the sound of the current environment in real time, and then generate a corresponding target audio signal according to the sound of the current environment, generally, sound source localization and adaptive beam implementation can be performed through the microphone array, and after the target audio signal is generated, the influence caused by interference sounds such as noise, reverberation and echo can be solved through noise reduction processing.

And step S20, performing semantic recognition on the character information to obtain a complete intention vector.

It can be understood that, by performing semantic recognition on the text information, a complete intention vector can be obtained, the complete intention vector includes but is not limited to a dialog object, a functional field, an instruction verb and an instruction entity parameter, the dialog object is a preset imagination entity representing the device, and can be customized according to different scenes, for example, what manages smart home devices may be called "grandfather", what helps movie search is called "sprite", and what asks for knowledge may be called "majacobian"; the function field is a preset specific function for realizing the equipment, and the function field comprises but is not limited to ' movie, music ', equipment control ', weather ', news ', stock securities ', audio programs ' and the like; the instruction verb is a preset instruction control verb for controlling the target device, and the instruction verb includes but is not limited to 'search', 'play', 'control sound', 'control progress', 'query', 'order', and 'payment'; the instruction entity is a preset entity name for existing in each functional field, and the instruction entity includes but is not limited to "city", "time", "article", "video and audio work", "company name", and "celebrity", etc.

And step S30, acquiring the current dialog scene of the current environment, and filtering irrelevant fields in the complete intention vector according to the current dialog scene.

It should be understood that the current dialog scenario is a dialog scenario to which the current environment belongs, different dialog scenarios correspond to different field sets, a corresponding target field set can be obtained through the dialog scenario, whether an irrelevant field exists in the complete intention vector can be determined according to whether the target field set can be matched with a field in the complete intention vector, and by filtering the complete intention vector, interference of the irrelevant field on voice interaction can be avoided, and the speed and efficiency of voice interaction are improved.

And step S40, obtaining an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction to realize voice interaction.

It can be understood that the operation instruction may be a special field or a segment of an execution program, a corresponding operation instruction may be generated according to the filtered complete intention vector, and a function control corresponding to the operation instruction may be performed on the target device according to the operation instruction, thereby implementing voice interaction.

Further, the step S40 includes the following steps:

It should be understood that the preset instruction model is a preset model for obtaining relevant operation instructions, the preset instruction model is used for reflecting a mapping relationship between each complete intention vector and an instruction, a target instruction can be obtained by substituting the filtered complete intention vector into the preset instruction model, that is, the target instruction serves as the operation instruction, and the target device can be subjected to function control through the operation instruction, so that voice interaction is realized, the function control generally can be direct playing or further voice confirmation, and certainly can also be other function controls, which is not limited in this embodiment.

Further, fig. 3 is a flowchart illustrating a second embodiment of the wake-up-free voice interaction method, and as shown in fig. 3, the second embodiment of the wake-up-free voice interaction method is proposed based on the first embodiment, in this embodiment, the step S10 specifically includes the following steps:

and step S11, performing semantic recognition on the character information according to a preset intention word database, and acquiring a dialog object, a function field and an instruction verb in the character information.

It should be noted that the preset intention database is a preset database for storing various intention words, and includes a text extraction framework, and a dialog object, a function field and an instruction verb in the text information can be obtained through the text extraction framework; the intention words are words or fields containing voice interaction intentions, semantic recognition is carried out on the text information according to the preset intention word database, and conversation objects, the function fields and the instruction verbs in the text information can be obtained.

And step S12, determining a complete intention vector according to the dialog object, the function field and the instruction verb.

It will be appreciated that a full intent vector may be determined from the dialog object, the functional domain and the instruction verb, i.e., the dialog object, the functional domain and the instruction verb as the full intent vector.

Further, the step S12 includes the following steps:

It should be understood that the preset historical speech database is a database in which the target device stores historical speech in a preset time period, the historical speech data of the previous session adjacent to the current session can be obtained from the preset historical speech database, the function field to which the current session belongs can be extracted from the historical speech data, and the function field to which the history belongs can be obtained in other manners.

It is understood that matching the function domain to which the history belongs may generate a corresponding matching result, and according to the analysis of the matching result, a complete intention vector may be determined according to the dialog object, the function domain to which the history belongs, and the instruction verb.

Further, the step of determining a complete intention vector according to the matching result, the dialog object, the belonging functional field and the instruction verb includes:

It should be understood that, when the matching result is that the function domain to which the history belongs is not the same as the function domain to which the history belongs, that is, when the matching is unsuccessful, it is determined whether the dialog object, the function domain to which the dialog object belongs, and the verb instruction are vectors included in a preset vector set, where the preset vector set is a preset set including preset vectors, and by correspondingly comparing the dialog object, the function domain to which the dialog object belongs, and the verb instruction with the vectors in the preset vector set, a complete intention vector can be determined according to the comparison result.

In a specific implementation, when the function domain to which the current text information belongs and the function domain to which the history belongs are not the same, it may be determined whether the dialog object, the function domain to which the current text information belongs and the verb instruction are vectors included in a preset vector set, that is, it may be determined whether the target device supports the dialog object, the function domain to which the current text information belongs and the verb instruction, and further it is determined whether the dialog object, the function domain to which the current text information belongs and the verb instruction verb are related or unrelated to the target device, so as to determine a complete intention vector, and when the matching result is that the function domain to which the history belongs is not the same, it may be determined whether the dialog object, the function domain to which the current text information belongs and the verb instruction verb are vectors included in a preset vector set; and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors, and when the matching result is that the affiliated function field is the same as the function field to which the history belongs, namely the matching is successful, directly taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors, wherein the dialog object, the affiliated function field and the instruction verb may be absent dialog objects, but the dialog objects can be taken as the complete intention vectors as long as the instruction verbs exist.

According to the scheme, semantic recognition is carried out on the character information according to a preset intention word database, and a conversation object, a function field and an instruction verb in the character information are obtained; and determining a complete intention vector according to the dialog object, the affiliated function field and the instruction verb, so that the accuracy of the control of the voice interaction function can be improved, the interference of irrelevant information on voice interaction is avoided, words do not need to be awakened, a user can naturally talk with voice equipment, the starting time of the equipment is saved, and the experience of far-field voice interaction of the user is improved.

Further, fig. 4 is a flowchart illustrating a third embodiment of the wake-up-free voice interaction method, as shown in fig. 4, based on the second embodiment, the wake-up-free voice interaction method according to the third embodiment of the present invention is proposed, and in this embodiment, the step S30 specifically includes the following steps:

and step S31, acquiring the current conversation scene to which the current environment belongs, and substituting the current conversation scene into a preset conversation field model to acquire a current conversation scene field set.

It should be noted that the current dialog scene to which the current environment belongs may be a dialog scene determined by analyzing the target audio signal and combining the historical audio signal, and a current dialog scene field set may be obtained by substituting the current dialog scene into a preset dialog field model, where the preset dialog field model is a preset model for reflecting mapping relationships between different dialog scenes and different dialog scene field sets, and different dialog scenes correspond to different field sets.

And step S32, matching the complete intention vector with the current dialog scene field set to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector.

And matching the complete intention vector with the field set of the current conversation scene to obtain a result of successful matching and a result of failed matching, wherein the result of failed matching is a field which cannot be matched, and the field which cannot be matched is used as an irrelevant field in the complete intention vector.

According to the scheme, the current dialog scene of the current environment is obtained, and the current dialog scene is substituted into the preset dialog field model to obtain the field set of the current dialog scene; matching the field sets of the current conversation scene remaining the complete intention vector to obtain fields which cannot be matched, taking the fields which cannot be matched as irrelevant fields in the complete intention vector, filtering and screening the irrelevant fields to effectively avoid the influence of the irrelevant fields on voice interaction, and enabling a user to naturally talk with voice equipment without waking up words, so that the time for starting the equipment is saved, and the experience of far-field voice interaction of the user is promoted.

Based on the above embodiment of the wake-up-free voice interaction method, the present invention further provides a wake-up-free voice interaction apparatus.

Referring to fig. 5, fig. 5 is a functional block diagram of a wake-up free voice interaction apparatus according to a first embodiment of the present invention.

In a first embodiment of the wake-up-free voice interaction apparatus of the present invention, the wake-up-free voice interaction apparatus includes: the system comprises an information acquisition module 10, a semantic recognition module 20, a filtering module 30 and a voice interaction module 40;

the information obtaining module 10 is configured to perform voice recognition on a target audio signal of a current environment to obtain text information.

The semantic recognition module 20 is configured to perform semantic recognition on the text information to obtain a complete intention vector.

The filtering module 30 is configured to obtain a current dialog scenario to which the current environment belongs, and filter out an irrelevant field in the complete intention vector according to the current dialog scenario.

And the voice interaction module 40 is configured to obtain an operation instruction according to the filtered complete intention vector, and perform function control on the target device according to the operation instruction, so as to implement voice interaction.

The steps implemented by each functional module of the wake-up-free voice interaction apparatus may refer to each embodiment of the wake-up-free voice interaction method of the present invention, and are not described herein again.

In addition, an embodiment of the present invention further provides a storage medium, where a wake-free voice interaction program is stored on the storage medium, and when executed by a processor, the wake-free voice interaction program implements the following operations:

Further, the wake-free voice interaction program, when executed by the processor, further implements the following operations:

matching the field sets of the current dialog scene remaining in the complete intention vector to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A wake-free voice interaction method is characterized by comprising the following steps:

performing semantic recognition on the character information according to a preset intention word database to obtain a complete intention vector, wherein the complete intention vector comprises but is not limited to a dialog object, a function field, an instruction verb and an instruction entity parameter, and the dialog object is a preset imagination image entity representing equipment;

acquiring an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction so as to realize voice interaction;

the obtaining a current dialog scene to which the current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene includes:

matching the complete intention vector with the current dialog scene field set to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector, wherein the current dialog scene is a dialog scene determined by analyzing the target audio signal and combining historical audio signals.

2. The wake-free voice interaction method of claim 1, wherein the semantically recognizing the text information to obtain a complete intention vector comprises:

3. The wake-free voice interaction method of claim 2, wherein the determining a complete intent vector from the dialog object, the functional realm and the instruction verb comprises:

4. A wake-free speech interaction method according to claim 3, wherein said determining a complete intention vector from said matching result, said dialog object, said functional realm and instruction verb comprises:

5. The wake-up-free voice interaction method according to claim 1, wherein the obtaining an operation instruction according to the filtered complete intention vector, and performing function control on the target device according to the operation instruction to implement voice interaction comprises:

6. A wake-free speech interaction method as claimed in any one of claims 1 to 5, wherein before performing speech recognition on the target audio signal of the current environment to obtain the text information, the wake-free speech interaction method further comprises:

7. An apparatus for wake-up-free voice interaction, the apparatus comprising: the system comprises an information acquisition module, a semantic recognition module, a filtering module and a voice interaction module;

the semantic recognition module is used for performing semantic recognition on the character information according to a preset intention word database to obtain a complete intention vector, wherein the complete intention vector comprises but is not limited to a dialog object, a function field, an instruction verb and an instruction entity parameter, and the dialog object is a preset imagination image entity representing equipment;

the voice interaction module is used for acquiring an operation instruction according to the filtered complete intention vector and performing function control on the target equipment according to the operation instruction so as to realize voice interaction;

the filtering module is further configured to obtain a current dialog scene to which a current environment belongs, and substitute the current dialog scene into a preset dialog field model to obtain a current dialog scene field set; matching the complete intention vector with the current dialog scene field set to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector, wherein the current dialog scene is a dialog scene determined by analyzing the target audio signal and combining historical audio signals.

8. An wake-free voice interaction device, comprising: a memory, a processor, and a wake-free voice interaction program stored on the memory and executable on the processor, the wake-free voice interaction program configured to implement the steps of the wake-free voice interaction method as claimed in any one of claims 1 to 6.

9. A storage medium, wherein the storage medium stores a wake-free voice interaction program, and the wake-free voice interaction program, when executed by a processor, implements the steps of the wake-free voice interaction method according to any one of claims 1 to 6.