CN109326289B - Wake-up-free voice interaction method, device, equipment and storage medium - Google Patents

Wake-up-free voice interaction method, device, equipment and storage medium Download PDF

Info

Publication number
CN109326289B
CN109326289B CN201811464212.XA CN201811464212A CN109326289B CN 109326289 B CN109326289 B CN 109326289B CN 201811464212 A CN201811464212 A CN 201811464212A CN 109326289 B CN109326289 B CN 109326289B
Authority
CN
China
Prior art keywords
voice interaction
instruction
current
wake
dialog
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811464212.XA
Other languages
Chinese (zh)
Other versions
CN109326289A (en
Inventor
姚凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Skyworth Digital Technology Co Ltd
Original Assignee
Shenzhen Skyworth Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Skyworth Digital Technology Co Ltd filed Critical Shenzhen Skyworth Digital Technology Co Ltd
Priority to CN201811464212.XA priority Critical patent/CN109326289B/en
Publication of CN109326289A publication Critical patent/CN109326289A/en
Application granted granted Critical
Publication of CN109326289B publication Critical patent/CN109326289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a wake-up-free voice interaction method, a wake-up-free voice interaction device, a wake-up-free voice interaction equipment and a storage medium, wherein text information is obtained by performing voice recognition on a target audio signal of a current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.

Description

Wake-up-free voice interaction method, device, equipment and storage medium
Technical Field
The present invention relates to the field of voice recognition, and in particular, to a wake-up-free voice interaction method, apparatus, device, and storage medium.
Background
In the existing far-field intelligent voice product, a user wakes up first when using the far-field intelligent voice product, namely, the identification function of the product is started to perform subsequent interaction through a customized command word, for example, "small degree" is fed back by a device, and then the user continues to say "i want to watch a movie", wherein the "small degree" is just a wake-up word, and after the device receives an instruction and executes the instruction, the interaction is closed to wait for next wake-up.
Waking by using a wake-up word as an important starting step affects the experience of many far-field voice interactions, wherein the common problems are 'no wake-up' and 'false wake-up', the 'no wake-up' causes that subsequent commands cannot be continued, and the 'false wake-up' is a situation that when a user does not give an instruction, the device mistakenly considers that the instruction exists due to the interference of environmental sound.
Disclosure of Invention
The invention mainly aims to provide a wake-up-free voice interaction method, a wake-up-free voice interaction device, equipment and a storage medium, and aims to solve the technical problem that in the prior art, a voice product is woken up by a wake-up word easily to cause false wake-up or not wake-up, so that the user intention cannot be cleared and known, and the user voice interaction experience is poor.
In order to achieve the above object, the present invention provides a wake-up free voice interaction method, which includes the following steps:
carrying out voice recognition on a target audio signal of the current environment to obtain character information;
carrying out semantic recognition on the character information to obtain a complete intention vector;
obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene;
and acquiring an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
Preferably, the semantic recognition of the text information to obtain a complete intention vector includes:
performing semantic recognition on the character information according to a preset intention word database to obtain a conversation object, a function field and an instruction verb in the character information;
and determining a complete intention vector according to the dialog object, the affiliated functional field and the instruction verb.
Preferably, the determining a complete intention vector according to the dialog object, the functional domain and the instruction verb includes:
acquiring a function field to which the history of the previous section of conversation adjacent to the current conversation belongs from a preset history voice database;
matching the function field to which the history belongs to generate a matching result;
and determining a complete intention vector according to the matching result, the dialog object, the affiliated functional field and the instruction verb.
Preferably, the determining a complete intention vector according to the matching result, the dialog object, the functional domain and the instruction verb includes:
when the matching result is that the affiliated function field is the same as the function field to which the history belongs, the dialog object, the affiliated function field and the instruction verb are used as complete intention vectors;
when the matching result is that the affiliated function field is different from the historical affiliated function field, judging whether the dialogue object, the affiliated function field and the instruction verb are vectors contained in a preset vector set;
and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors.
Preferably, the obtaining a current dialog scenario to which the current environment belongs, and filtering out irrelevant fields in the complete intention vector according to the current dialog scenario includes:
acquiring a current conversation scene to which a current environment belongs, and substituting the current conversation scene into a preset conversation field model to acquire a current conversation scene field set;
and matching the complete intention vector with the field set of the current dialog scene to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector.
Preferably, the obtaining an operation instruction according to the filtered complete intention vector, and performing function control on the target device according to the operation instruction to implement voice interaction includes:
substituting the filtered complete intention vectors into a preset instruction model to obtain a target instruction, wherein the target instruction is used as an operation instruction, and the preset instruction model is used for reflecting the mapping relation between each complete intention vector and the instruction;
and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
Preferably, before performing speech recognition on the target audio signal of the current environment and obtaining text information, the wake-up-free speech interaction method further includes:
the method comprises the steps of receiving sound of the current environment through a microphone of the current equipment, and generating a target audio signal according to the sound of the current environment.
In addition, to achieve the above object, the present invention further provides a wake-up-free voice interaction device, where the wake-up-free voice interaction device includes: the system comprises a memory, a processor and a wake-free voice interaction program stored on the memory and capable of running on the processor, wherein the wake-free voice interaction program is configured to realize the steps of the wake-free voice interaction method.
In addition, in order to achieve the above object, the present invention further provides a storage medium, where a wake-free voice interaction program is stored, and when executed by a processor, the wake-free voice interaction program implements the steps of the wake-free voice interaction method as described above.
In addition, to achieve the above object, the present invention further provides a wake-up-free voice interaction apparatus, including: the system comprises an information acquisition module, a semantic recognition module, a filtering module and a voice interaction module;
the information acquisition module is used for carrying out voice recognition on a target audio signal of the current environment to acquire character information;
the semantic recognition module is used for carrying out semantic recognition on the character information to obtain a complete intention vector;
the filtering module is used for acquiring a current dialog scene to which a current environment belongs and filtering irrelevant fields in the complete intention vector according to the current dialog scene;
and the voice interaction module is used for acquiring an operation instruction according to the filtered complete intention vector and carrying out function control on the target equipment according to the operation instruction so as to realize voice interaction.
The awakening-free voice interaction method provided by the invention obtains the text information by carrying out voice recognition on the target audio signal of the current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.
Drawings
Fig. 1 is a schematic structural diagram of a wake-up-free voice interaction device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a wake-up free voice interaction method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a wake-up free voice interaction method according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a wake-up free voice interaction method according to a third embodiment of the present invention;
fig. 5 is a functional block diagram of a wake-up free voice interaction apparatus according to a first embodiment of the present invention.
The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The solution of the embodiment of the invention is mainly as follows: the method comprises the steps of carrying out voice recognition on a target audio signal of the current environment to obtain character information; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; according to the filtered complete intention vector, an operation instruction is obtained, according to the operation instruction performs function control on the target device, voice interaction is achieved, through judgment on the complete intention vector, influence of irrelevant conversation on the voice interaction can be avoided, a word does not need to be wakened up, a user can naturally talk with the voice device, starting time of the device is saved, experience of far-field voice interaction of the user is improved, the problem that mistaken awakening or awakening failure of awakening a voice product easily occurs due to the fact that the voice product is wakened up by the wakening word in the prior art is solved, user intention cannot be cleared and known, and technical problems of poor user voice interaction experience are caused.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a wake-up free voice interaction device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the wake-free voice interaction device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
It will be understood by those skilled in the art that the wake-free voice interaction device configuration shown in fig. 1 does not constitute a limitation of the wake-free voice interaction device, and may include more or less components than those shown, or combine some components, or a different arrangement of components.
As shown in fig. 1, a memory 1005 as a storage medium may include an operating device, a network communication module, a user interface module, and a wake-free voice interaction program.
Carrying out voice recognition on a target audio signal of the current environment to obtain character information;
carrying out semantic recognition on the character information to obtain a complete intention vector;
obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene;
and acquiring an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
Further, the processor 1001 may call the wake-free voice interaction program stored in the memory 1005, and further perform the following operations:
performing semantic recognition on the character information according to a preset intention word database to obtain a conversation object, a function field and an instruction verb in the character information;
and determining a complete intention vector according to the dialog object, the affiliated functional field and the instruction verb.
Further, the processor 1001 may call the wake-free voice interaction program stored in the memory 1005, and further perform the following operations:
acquiring a function field to which the history of the previous section of conversation adjacent to the current conversation belongs from a preset history voice database;
matching the function field to which the history belongs to generate a matching result;
and determining a complete intention vector according to the matching result, the dialog object, the affiliated functional field and the instruction verb.
Further, the processor 1001 may call the wake-free voice interaction program stored in the memory 1005, and further perform the following operations:
when the matching result is that the affiliated function field is the same as the function field to which the history belongs, the dialog object, the affiliated function field and the instruction verb are used as complete intention vectors;
when the matching result is that the affiliated function field is different from the historical affiliated function field, judging whether the dialogue object, the affiliated function field and the instruction verb are vectors contained in a preset vector set;
and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors.
Further, the processor 1001 may call the wake-free voice interaction program stored in the memory 1005, and further perform the following operations:
acquiring a current conversation scene to which a current environment belongs, and substituting the current conversation scene into a preset conversation field model to acquire a current conversation scene field set;
and matching the complete intention vector with the field set of the current dialog scene to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector.
Further, the processor 1001 may call the wake-free voice interaction program stored in the memory 1005, and further perform the following operations:
substituting the filtered complete intention vectors into a preset instruction model to obtain a target instruction, wherein the target instruction is used as an operation instruction, and the preset instruction model is used for reflecting the mapping relation between each complete intention vector and the instruction;
and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
Further, the processor 1001 may call the wake-free voice interaction program stored in the memory 1005, and further perform the following operations:
the method comprises the steps of receiving sound of the current environment through a microphone of the current equipment, and generating a target audio signal according to the sound of the current environment.
According to the scheme, the text information is obtained by performing voice recognition on the target audio signal of the current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.
Based on the hardware structure, the embodiment of the wake-up-free voice interaction method is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a wake-up free voice interaction method according to a first embodiment of the present invention.
In a first embodiment, the wake-free voice interaction method includes the following steps:
and step S10, performing voice recognition on the target audio signal of the current environment to obtain character information.
It should be noted that, the target audio signal of the current environment is an audio signal collection mode corresponding to the sound collected by the current environment, the sound of the current environment can be collected by the microphone device and converted into a corresponding audio signal, and the text information can be obtained by performing speech recognition on the target audio signal.
Further, before the step S10, the wake-free voice interaction method further includes the following steps:
the method comprises the steps of receiving sound of the current environment through a microphone of the current equipment, and generating a target audio signal according to the sound of the current environment.
It can be understood that the microphone of the current device can receive the sound of the current environment in real time, and then generate a corresponding target audio signal according to the sound of the current environment, generally, sound source localization and adaptive beam implementation can be performed through the microphone array, and after the target audio signal is generated, the influence caused by interference sounds such as noise, reverberation and echo can be solved through noise reduction processing.
And step S20, performing semantic recognition on the character information to obtain a complete intention vector.
It can be understood that, by performing semantic recognition on the text information, a complete intention vector can be obtained, the complete intention vector includes but is not limited to a dialog object, a functional field, an instruction verb and an instruction entity parameter, the dialog object is a preset imagination entity representing the device, and can be customized according to different scenes, for example, what manages smart home devices may be called "grandfather", what helps movie search is called "sprite", and what asks for knowledge may be called "majacobian"; the function field is a preset specific function for realizing the equipment, and the function field comprises but is not limited to ' movie, music ', equipment control ', weather ', news ', stock securities ', audio programs ' and the like; the instruction verb is a preset instruction control verb for controlling the target device, and the instruction verb includes but is not limited to 'search', 'play', 'control sound', 'control progress', 'query', 'order', and 'payment'; the instruction entity is a preset entity name for existing in each functional field, and the instruction entity includes but is not limited to "city", "time", "article", "video and audio work", "company name", and "celebrity", etc.
And step S30, acquiring the current dialog scene of the current environment, and filtering irrelevant fields in the complete intention vector according to the current dialog scene.
It should be understood that the current dialog scenario is a dialog scenario to which the current environment belongs, different dialog scenarios correspond to different field sets, a corresponding target field set can be obtained through the dialog scenario, whether an irrelevant field exists in the complete intention vector can be determined according to whether the target field set can be matched with a field in the complete intention vector, and by filtering the complete intention vector, interference of the irrelevant field on voice interaction can be avoided, and the speed and efficiency of voice interaction are improved.
And step S40, obtaining an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction to realize voice interaction.
It can be understood that the operation instruction may be a special field or a segment of an execution program, a corresponding operation instruction may be generated according to the filtered complete intention vector, and a function control corresponding to the operation instruction may be performed on the target device according to the operation instruction, thereby implementing voice interaction.
Further, the step S40 includes the following steps:
substituting the filtered complete intention vectors into a preset instruction model to obtain a target instruction, wherein the target instruction is used as an operation instruction, and the preset instruction model is used for reflecting the mapping relation between each complete intention vector and the instruction;
and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
It should be understood that the preset instruction model is a preset model for obtaining relevant operation instructions, the preset instruction model is used for reflecting a mapping relationship between each complete intention vector and an instruction, a target instruction can be obtained by substituting the filtered complete intention vector into the preset instruction model, that is, the target instruction serves as the operation instruction, and the target device can be subjected to function control through the operation instruction, so that voice interaction is realized, the function control generally can be direct playing or further voice confirmation, and certainly can also be other function controls, which is not limited in this embodiment.
According to the scheme, the text information is obtained by performing voice recognition on the target audio signal of the current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.
Further, fig. 3 is a flowchart illustrating a second embodiment of the wake-up-free voice interaction method, and as shown in fig. 3, the second embodiment of the wake-up-free voice interaction method is proposed based on the first embodiment, in this embodiment, the step S10 specifically includes the following steps:
and step S11, performing semantic recognition on the character information according to a preset intention word database, and acquiring a dialog object, a function field and an instruction verb in the character information.
It should be noted that the preset intention database is a preset database for storing various intention words, and includes a text extraction framework, and a dialog object, a function field and an instruction verb in the text information can be obtained through the text extraction framework; the intention words are words or fields containing voice interaction intentions, semantic recognition is carried out on the text information according to the preset intention word database, and conversation objects, the function fields and the instruction verbs in the text information can be obtained.
And step S12, determining a complete intention vector according to the dialog object, the function field and the instruction verb.
It will be appreciated that a full intent vector may be determined from the dialog object, the functional domain and the instruction verb, i.e., the dialog object, the functional domain and the instruction verb as the full intent vector.
Further, the step S12 includes the following steps:
acquiring a function field to which the history of the previous section of conversation adjacent to the current conversation belongs from a preset history voice database;
matching the function field to which the history belongs to generate a matching result;
and determining a complete intention vector according to the matching result, the dialog object, the affiliated functional field and the instruction verb.
It should be understood that the preset historical speech database is a database in which the target device stores historical speech in a preset time period, the historical speech data of the previous session adjacent to the current session can be obtained from the preset historical speech database, the function field to which the current session belongs can be extracted from the historical speech data, and the function field to which the history belongs can be obtained in other manners.
It is understood that matching the function domain to which the history belongs may generate a corresponding matching result, and according to the analysis of the matching result, a complete intention vector may be determined according to the dialog object, the function domain to which the history belongs, and the instruction verb.
Further, the step of determining a complete intention vector according to the matching result, the dialog object, the belonging functional field and the instruction verb includes:
when the matching result is that the affiliated function field is the same as the function field to which the history belongs, the dialog object, the affiliated function field and the instruction verb are used as complete intention vectors;
when the matching result is that the affiliated function field is different from the historical affiliated function field, judging whether the dialogue object, the affiliated function field and the instruction verb are vectors contained in a preset vector set;
and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors.
It should be understood that, when the matching result is that the function domain to which the history belongs is not the same as the function domain to which the history belongs, that is, when the matching is unsuccessful, it is determined whether the dialog object, the function domain to which the dialog object belongs, and the verb instruction are vectors included in a preset vector set, where the preset vector set is a preset set including preset vectors, and by correspondingly comparing the dialog object, the function domain to which the dialog object belongs, and the verb instruction with the vectors in the preset vector set, a complete intention vector can be determined according to the comparison result.
In a specific implementation, when the function domain to which the current text information belongs and the function domain to which the history belongs are not the same, it may be determined whether the dialog object, the function domain to which the current text information belongs and the verb instruction are vectors included in a preset vector set, that is, it may be determined whether the target device supports the dialog object, the function domain to which the current text information belongs and the verb instruction, and further it is determined whether the dialog object, the function domain to which the current text information belongs and the verb instruction verb are related or unrelated to the target device, so as to determine a complete intention vector, and when the matching result is that the function domain to which the history belongs is not the same, it may be determined whether the dialog object, the function domain to which the current text information belongs and the verb instruction verb are vectors included in a preset vector set; and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors, and when the matching result is that the affiliated function field is the same as the function field to which the history belongs, namely the matching is successful, directly taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors, wherein the dialog object, the affiliated function field and the instruction verb may be absent dialog objects, but the dialog objects can be taken as the complete intention vectors as long as the instruction verbs exist.
According to the scheme, semantic recognition is carried out on the character information according to a preset intention word database, and a conversation object, a function field and an instruction verb in the character information are obtained; and determining a complete intention vector according to the dialog object, the affiliated function field and the instruction verb, so that the accuracy of the control of the voice interaction function can be improved, the interference of irrelevant information on voice interaction is avoided, words do not need to be awakened, a user can naturally talk with voice equipment, the starting time of the equipment is saved, and the experience of far-field voice interaction of the user is improved.
Further, fig. 4 is a flowchart illustrating a third embodiment of the wake-up-free voice interaction method, as shown in fig. 4, based on the second embodiment, the wake-up-free voice interaction method according to the third embodiment of the present invention is proposed, and in this embodiment, the step S30 specifically includes the following steps:
and step S31, acquiring the current conversation scene to which the current environment belongs, and substituting the current conversation scene into a preset conversation field model to acquire a current conversation scene field set.
It should be noted that the current dialog scene to which the current environment belongs may be a dialog scene determined by analyzing the target audio signal and combining the historical audio signal, and a current dialog scene field set may be obtained by substituting the current dialog scene into a preset dialog field model, where the preset dialog field model is a preset model for reflecting mapping relationships between different dialog scenes and different dialog scene field sets, and different dialog scenes correspond to different field sets.
And step S32, matching the complete intention vector with the current dialog scene field set to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector.
And matching the complete intention vector with the field set of the current conversation scene to obtain a result of successful matching and a result of failed matching, wherein the result of failed matching is a field which cannot be matched, and the field which cannot be matched is used as an irrelevant field in the complete intention vector.
According to the scheme, the current dialog scene of the current environment is obtained, and the current dialog scene is substituted into the preset dialog field model to obtain the field set of the current dialog scene; matching the field sets of the current conversation scene remaining the complete intention vector to obtain fields which cannot be matched, taking the fields which cannot be matched as irrelevant fields in the complete intention vector, filtering and screening the irrelevant fields to effectively avoid the influence of the irrelevant fields on voice interaction, and enabling a user to naturally talk with voice equipment without waking up words, so that the time for starting the equipment is saved, and the experience of far-field voice interaction of the user is promoted.
Based on the above embodiment of the wake-up-free voice interaction method, the present invention further provides a wake-up-free voice interaction apparatus.
Referring to fig. 5, fig. 5 is a functional block diagram of a wake-up free voice interaction apparatus according to a first embodiment of the present invention.
In a first embodiment of the wake-up-free voice interaction apparatus of the present invention, the wake-up-free voice interaction apparatus includes: the system comprises an information acquisition module 10, a semantic recognition module 20, a filtering module 30 and a voice interaction module 40;
the information obtaining module 10 is configured to perform voice recognition on a target audio signal of a current environment to obtain text information.
The semantic recognition module 20 is configured to perform semantic recognition on the text information to obtain a complete intention vector.
The filtering module 30 is configured to obtain a current dialog scenario to which the current environment belongs, and filter out an irrelevant field in the complete intention vector according to the current dialog scenario.
And the voice interaction module 40 is configured to obtain an operation instruction according to the filtered complete intention vector, and perform function control on the target device according to the operation instruction, so as to implement voice interaction.
The steps implemented by each functional module of the wake-up-free voice interaction apparatus may refer to each embodiment of the wake-up-free voice interaction method of the present invention, and are not described herein again.
In addition, an embodiment of the present invention further provides a storage medium, where a wake-free voice interaction program is stored on the storage medium, and when executed by a processor, the wake-free voice interaction program implements the following operations:
carrying out voice recognition on a target audio signal of the current environment to obtain character information;
carrying out semantic recognition on the character information to obtain a complete intention vector;
obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene;
and acquiring an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
Further, the wake-free voice interaction program, when executed by the processor, further implements the following operations:
performing semantic recognition on the character information according to a preset intention word database to obtain a conversation object, a function field and an instruction verb in the character information;
and determining a complete intention vector according to the dialog object, the affiliated functional field and the instruction verb.
Further, the wake-free voice interaction program, when executed by the processor, further implements the following operations:
acquiring a function field to which the history of the previous section of conversation adjacent to the current conversation belongs from a preset history voice database;
matching the function field to which the history belongs to generate a matching result;
and determining a complete intention vector according to the matching result, the dialog object, the affiliated functional field and the instruction verb.
Further, the wake-free voice interaction program, when executed by the processor, further implements the following operations:
when the matching result is that the affiliated function field is the same as the function field to which the history belongs, the dialog object, the affiliated function field and the instruction verb are used as complete intention vectors;
when the matching result is that the affiliated function field is different from the historical affiliated function field, judging whether the dialogue object, the affiliated function field and the instruction verb are vectors contained in a preset vector set;
and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors.
Further, the wake-free voice interaction program, when executed by the processor, further implements the following operations:
acquiring a current conversation scene to which a current environment belongs, and substituting the current conversation scene into a preset conversation field model to acquire a current conversation scene field set;
matching the field sets of the current dialog scene remaining in the complete intention vector to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector.
Further, the wake-free voice interaction program, when executed by the processor, further implements the following operations:
substituting the filtered complete intention vectors into a preset instruction model to obtain a target instruction, wherein the target instruction is used as an operation instruction, and the preset instruction model is used for reflecting the mapping relation between each complete intention vector and the instruction;
and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
Further, the wake-free voice interaction program, when executed by the processor, further implements the following operations:
the method comprises the steps of receiving sound of the current environment through a microphone of the current equipment, and generating a target audio signal according to the sound of the current environment.
According to the scheme, the text information is obtained by performing voice recognition on the target audio signal of the current environment; carrying out semantic recognition on the character information to obtain a complete intention vector; obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene; the method comprises the steps of obtaining an operation instruction according to a filtered complete intention vector, carrying out function control on target equipment according to the operation instruction to realize voice interaction, avoiding influence of irrelevant conversation on the voice interaction through judgment of the complete intention vector, enabling a user to have a natural conversation with voice equipment without awakening words, saving equipment starting time and improving user far-field voice interaction experience.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A wake-free voice interaction method is characterized by comprising the following steps:
carrying out voice recognition on a target audio signal of the current environment to obtain character information;
performing semantic recognition on the character information according to a preset intention word database to obtain a complete intention vector, wherein the complete intention vector comprises but is not limited to a dialog object, a function field, an instruction verb and an instruction entity parameter, and the dialog object is a preset imagination image entity representing equipment;
obtaining a current dialog scene to which a current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene;
acquiring an operation instruction according to the filtered complete intention vector, and performing function control on the target equipment according to the operation instruction so as to realize voice interaction;
the obtaining a current dialog scene to which the current environment belongs, and filtering irrelevant fields in the complete intention vector according to the current dialog scene includes:
acquiring a current conversation scene to which a current environment belongs, and substituting the current conversation scene into a preset conversation field model to acquire a current conversation scene field set;
matching the complete intention vector with the current dialog scene field set to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector, wherein the current dialog scene is a dialog scene determined by analyzing the target audio signal and combining historical audio signals.
2. The wake-free voice interaction method of claim 1, wherein the semantically recognizing the text information to obtain a complete intention vector comprises:
performing semantic recognition on the character information according to a preset intention word database to obtain a conversation object, a function field and an instruction verb in the character information;
and determining a complete intention vector according to the dialog object, the affiliated functional field and the instruction verb.
3. The wake-free voice interaction method of claim 2, wherein the determining a complete intent vector from the dialog object, the functional realm and the instruction verb comprises:
acquiring a function field to which the history of the previous section of conversation adjacent to the current conversation belongs from a preset history voice database;
matching the function field to which the history belongs to generate a matching result;
and determining a complete intention vector according to the matching result, the dialog object, the affiliated functional field and the instruction verb.
4. A wake-free speech interaction method according to claim 3, wherein said determining a complete intention vector from said matching result, said dialog object, said functional realm and instruction verb comprises:
when the matching result is that the affiliated function field is the same as the function field to which the history belongs, the dialog object, the affiliated function field and the instruction verb are used as complete intention vectors;
when the matching result is that the affiliated function field is different from the historical affiliated function field, judging whether the dialogue object, the affiliated function field and the instruction verb are vectors contained in a preset vector set;
and when the dialog object, the affiliated function field and the instruction verb are vectors contained in the preset vector set, taking the dialog object, the affiliated function field and the instruction verb as complete intention vectors.
5. The wake-up-free voice interaction method according to claim 1, wherein the obtaining an operation instruction according to the filtered complete intention vector, and performing function control on the target device according to the operation instruction to implement voice interaction comprises:
substituting the filtered complete intention vectors into a preset instruction model to obtain a target instruction, wherein the target instruction is used as an operation instruction, and the preset instruction model is used for reflecting the mapping relation between each complete intention vector and the instruction;
and performing function control on the target equipment according to the operation instruction so as to realize voice interaction.
6. A wake-free speech interaction method as claimed in any one of claims 1 to 5, wherein before performing speech recognition on the target audio signal of the current environment to obtain the text information, the wake-free speech interaction method further comprises:
the method comprises the steps of receiving sound of the current environment through a microphone of the current equipment, and generating a target audio signal according to the sound of the current environment.
7. An apparatus for wake-up-free voice interaction, the apparatus comprising: the system comprises an information acquisition module, a semantic recognition module, a filtering module and a voice interaction module;
the information acquisition module is used for carrying out voice recognition on a target audio signal of the current environment to acquire character information;
the semantic recognition module is used for performing semantic recognition on the character information according to a preset intention word database to obtain a complete intention vector, wherein the complete intention vector comprises but is not limited to a dialog object, a function field, an instruction verb and an instruction entity parameter, and the dialog object is a preset imagination image entity representing equipment;
the filtering module is used for acquiring a current dialog scene to which a current environment belongs and filtering irrelevant fields in the complete intention vector according to the current dialog scene;
the voice interaction module is used for acquiring an operation instruction according to the filtered complete intention vector and performing function control on the target equipment according to the operation instruction so as to realize voice interaction;
the filtering module is further configured to obtain a current dialog scene to which a current environment belongs, and substitute the current dialog scene into a preset dialog field model to obtain a current dialog scene field set; matching the complete intention vector with the current dialog scene field set to obtain fields which cannot be matched, and taking the fields which cannot be matched as irrelevant fields in the complete intention vector, wherein the current dialog scene is a dialog scene determined by analyzing the target audio signal and combining historical audio signals.
8. An wake-free voice interaction device, comprising: a memory, a processor, and a wake-free voice interaction program stored on the memory and executable on the processor, the wake-free voice interaction program configured to implement the steps of the wake-free voice interaction method as claimed in any one of claims 1 to 6.
9. A storage medium, wherein the storage medium stores a wake-free voice interaction program, and the wake-free voice interaction program, when executed by a processor, implements the steps of the wake-free voice interaction method according to any one of claims 1 to 6.
CN201811464212.XA 2018-11-30 2018-11-30 Wake-up-free voice interaction method, device, equipment and storage medium Active CN109326289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811464212.XA CN109326289B (en) 2018-11-30 2018-11-30 Wake-up-free voice interaction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811464212.XA CN109326289B (en) 2018-11-30 2018-11-30 Wake-up-free voice interaction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109326289A CN109326289A (en) 2019-02-12
CN109326289B true CN109326289B (en) 2021-10-22

Family

ID=65256817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811464212.XA Active CN109326289B (en) 2018-11-30 2018-11-30 Wake-up-free voice interaction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109326289B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110225386B (en) * 2019-05-09 2021-09-14 海信视像科技股份有限公司 Display control method and display device
CN111754989B (en) * 2019-05-28 2023-04-07 广东小天才科技有限公司 Avoiding method for voice false wake-up and electronic equipment
CN110047487B (en) * 2019-06-05 2022-03-18 广州小鹏汽车科技有限公司 Wake-up method and device for vehicle-mounted voice equipment, vehicle and machine-readable medium
CN112397060B (en) * 2019-07-31 2024-02-23 北京声智科技有限公司 Voice instruction processing method, system, equipment and medium
CN112397062A (en) 2019-08-15 2021-02-23 华为技术有限公司 Voice interaction method, device, terminal and storage medium
CN110647622A (en) * 2019-09-29 2020-01-03 北京金山安全软件有限公司 Interactive data validity identification method and device
CN110660385A (en) * 2019-09-30 2020-01-07 出门问问信息科技有限公司 Command word detection method and electronic equipment
CN112702469B (en) * 2019-10-23 2022-07-22 阿里巴巴集团控股有限公司 Voice interaction method and device, audio and video processing method and voice broadcasting method
US11594224B2 (en) 2019-12-04 2023-02-28 Samsung Electronics Co., Ltd. Voice user interface for intervening in conversation of at least one user by adjusting two different thresholds
CN111462741B (en) * 2020-03-02 2024-02-02 北京声智科技有限公司 Voice data processing method, device and storage medium
CN113393834B (en) * 2020-03-11 2024-04-16 阿里巴巴集团控股有限公司 Control method and device
CN112185374A (en) * 2020-09-07 2021-01-05 北京如影智能科技有限公司 Method and device for determining voice intention
CN112068795A (en) * 2020-09-10 2020-12-11 中航华东光电(上海)有限公司 Airborne screen brightness auxiliary control system and method based on intelligent voice
CN112233699B (en) * 2020-10-13 2023-04-28 中移(杭州)信息技术有限公司 Voice broadcasting method, intelligent voice equipment and computer readable storage medium
CN112230877A (en) * 2020-10-16 2021-01-15 惠州Tcl移动通信有限公司 Voice operation method and device, storage medium and electronic equipment
CN112347234A (en) * 2020-11-05 2021-02-09 北京羽扇智信息科技有限公司 Text display method and device
CN112614490B (en) * 2020-12-09 2024-04-16 北京罗克维尔斯科技有限公司 Method, device, medium, equipment, system and vehicle for generating voice instruction
CN112802452A (en) * 2020-12-21 2021-05-14 出门问问(武汉)信息科技有限公司 Junk instruction identification method and device
CN112527236A (en) * 2020-12-23 2021-03-19 北京梧桐车联科技有限责任公司 Voice mode control method, device and storage medium
CN112667076A (en) * 2020-12-23 2021-04-16 广州橙行智动汽车科技有限公司 Voice interaction data processing method and device
CN112802470A (en) * 2020-12-30 2021-05-14 厦门市美亚柏科信息股份有限公司 Offline voice control method and terminal
CN112951235B (en) * 2021-01-27 2022-08-16 北京云迹科技股份有限公司 Voice recognition method and device
CN112801239B (en) * 2021-01-28 2023-11-21 科大讯飞股份有限公司 Input recognition method, input recognition device, electronic equipment and storage medium
CN113330513B (en) * 2021-04-20 2024-08-27 华为技术有限公司 Voice information processing method and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268313A (en) * 2013-05-21 2013-08-28 北京云知声信息技术有限公司 Method and device for semantic analysis of natural language
CN107195303A (en) * 2017-06-16 2017-09-22 北京云知声信息技术有限公司 Method of speech processing and device
CN107492374A (en) * 2017-10-11 2017-12-19 深圳市汉普电子技术开发有限公司 A kind of sound control method, smart machine and storage medium
CN108364650A (en) * 2018-04-18 2018-08-03 北京声智科技有限公司 The adjusting apparatus and method of voice recognition result
CN108806674A (en) * 2017-05-05 2018-11-13 北京搜狗科技发展有限公司 A kind of positioning navigation method, device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100679043B1 (en) * 2005-02-15 2007-02-05 삼성전자주식회사 Apparatus and method for spoken dialogue interface with task-structured frames

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268313A (en) * 2013-05-21 2013-08-28 北京云知声信息技术有限公司 Method and device for semantic analysis of natural language
CN108806674A (en) * 2017-05-05 2018-11-13 北京搜狗科技发展有限公司 A kind of positioning navigation method, device and electronic equipment
CN107195303A (en) * 2017-06-16 2017-09-22 北京云知声信息技术有限公司 Method of speech processing and device
CN107492374A (en) * 2017-10-11 2017-12-19 深圳市汉普电子技术开发有限公司 A kind of sound control method, smart machine and storage medium
CN108364650A (en) * 2018-04-18 2018-08-03 北京声智科技有限公司 The adjusting apparatus and method of voice recognition result

Also Published As

Publication number Publication date
CN109326289A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
CN109326289B (en) Wake-up-free voice interaction method, device, equipment and storage medium
CN111508474B (en) Voice interruption method, electronic equipment and storage device
CN108962262B (en) Voice data processing method and device
CN102568478B (en) Video play control method and system based on voice recognition
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
CN109584860B (en) Voice wake-up word definition method and system
CN112201246B (en) Intelligent control method and device based on voice, electronic equipment and storage medium
CN107145329A (en) Apparatus control method, device and smart machine
CN110047481B (en) Method and apparatus for speech recognition
CN111161714B (en) Voice information processing method, electronic equipment and storage medium
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
CN109994106B (en) Voice processing method and equipment
CN102999161A (en) Implementation method and application of voice awakening module
CN116844543A (en) Control method and system based on voice interaction
CN111916068B (en) Audio detection method and device
CN111145763A (en) GRU-based voice recognition method and system in audio
CN110473542B (en) Awakening method and device for voice instruction execution function and electronic equipment
CN113593565B (en) Intelligent home device management and control method and system
CN111862943B (en) Speech recognition method and device, electronic equipment and storage medium
CN108492826B (en) Audio processing method and device, intelligent equipment and medium
CN109065026B (en) Recording control method and device
CN116391225A (en) Method and system for assigning unique voices to electronic devices
CN118280356A (en) Voice interaction method, electronic equipment, vehicle and storage medium
CN116016779A (en) Voice call translation assisting method, system, computer equipment and storage medium
KR20240090400A (en) Continuous conversation based on digital signal processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant