CN112037786A - Voice interaction method, device, equipment and storage medium - Google Patents
Voice interaction method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112037786A CN112037786A CN202010896268.3A CN202010896268A CN112037786A CN 112037786 A CN112037786 A CN 112037786A CN 202010896268 A CN202010896268 A CN 202010896268A CN 112037786 A CN112037786 A CN 112037786A
- Authority
- CN
- China
- Prior art keywords
- voice
- preset word
- determining
- word
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000004044 response Effects 0.000 claims abstract description 37
- 238000012544 monitoring process Methods 0.000 claims abstract description 10
- 230000002452 interceptive effect Effects 0.000 claims abstract description 3
- 230000015654 memory Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 abstract description 7
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 230000003044 adaptive effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application discloses a voice interaction method, a voice interaction device, voice interaction equipment and a storage medium, and relates to the fields of smart home and artificial intelligence. The specific implementation scheme is as follows: monitoring the voice of a user in real time; recognizing the voice, and determining whether the voice comprises a first preset word or not; in response to determining that the voice includes a first preset word, determining whether context information located in the first preset word in the voice includes a second preset word; in response to determining that the context information of the first preset word includes a second preset word, performing intent recognition on the context information of the second preset word; and controlling the equipment according to the intention recognition result so as to respond to the user. The realization mode enables the interactive process of the equipment to become more adaptive and the user experience to be more friendly.
Description
Technical Field
The application relates to the technical field of computers, in particular to the fields of smart homes and artificial intelligence, and particularly relates to a voice interaction method, device, equipment and storage medium.
Background
With the continuous development of artificial intelligence technology, terminal device control systems based on voice awakening are also continuously developed, wherein the voice awakening is used as an entrance for controlling the terminal device, and is gradually a research hotspot in the technical field of artificial intelligence.
At present, a user can wake up the terminal device through voice and control the terminal device to execute corresponding operations, so that a lot of convenience is brought. However, since different users have different wake-up habits, how to adapt the terminal device to different wake-up habits is a problem to be solved.
Disclosure of Invention
A voice interaction method, apparatus, device and storage medium are provided.
According to a first aspect, there is provided a voice interaction method, comprising: monitoring the voice of a user in real time; recognizing the voice, and determining whether the voice comprises a first preset word or not; in response to determining that the voice includes a first preset word, determining whether context information located in the first preset word in the voice includes a second preset word; in response to determining that the context information of the first preset word includes a second preset word, performing intent recognition on the context information of the second preset word; the control device responds to the user based on the intention recognition result.
According to a second aspect, there is provided a voice interaction apparatus comprising: a real-time monitoring unit configured to monitor a voice of a user in real time; a voice recognition unit configured to recognize a voice, and determine whether the voice includes a first preset word; a determining unit configured to determine whether context information located in a first preset word in the voice includes a second preset word in response to determining that the voice includes the first preset word; an intention recognition unit configured to perform intention recognition on context information of a second preset word in response to a determination that the context information of the first preset word includes the second preset word; and a device control unit configured to control the device to respond to the user according to the intention recognition result.
According to a third aspect, there is provided a voice interaction electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in the first aspect.
According to the technology of the application, the technical problem that the existing terminal equipment awakening method cannot well adapt to awakening habits of different users is solved, so that the adaptability of the equipment interaction process becomes stronger, and the user experience is more friendly.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a voice interaction method according to the present application;
FIG. 3 is a schematic diagram of an application scenario of a voice interaction method according to the present application;
FIG. 4 is a flow diagram of another embodiment of a voice interaction method according to the present application;
FIG. 5 is a schematic block diagram of one embodiment of a voice interaction device according to the present application;
fig. 6 is a block diagram of an electronic device for implementing a voice interaction method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the voice interaction method or voice interaction apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include intelligent end devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the intelligent terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the intelligent terminal device 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice recognition application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the intelligent terminal devices 101, 102, 103.
The intelligent terminal devices 101, 102, 103 may be hardware or software. When the smart terminal 101, 102, 103 is hardware, it can be various electronic devices with voice recognition function, including but not limited to smart phones, smart speakers, smart robots, etc. When the smart terminal 101, 102, 103 is software, it can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a background server that processes speech acquired by the smart terminal apparatuses 101, 102, 103. The backend server may analyze and otherwise process the data such as the voice, and feed back the processing result (e.g., response data) to the smart terminal apparatus 101, 102, 103.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the voice interaction method provided by the embodiment of the present application is generally executed by the intelligent terminal devices 101, 102, and 103. Accordingly, the voice interaction device is generally disposed in the intelligent terminal apparatus 101, 102, 103.
It should be understood that the number of intelligent end devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of intelligent end devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a voice interaction method according to the present application is shown. The voice interaction method of the embodiment comprises the following steps:
In this embodiment, the execution subject of the voice interaction method (for example, the intelligent terminal devices 101, 102, 103 shown in fig. 1) may monitor the voice of the user in real time. Specifically, the execution main body may be provided with a microphone array for collecting the voice of the user in real time and analyzing the voice.
After the execution main body collects the voice of the user, the voice can be recognized, and whether the voice comprises a first preset word or not is determined. Specifically, the execution main body may perform speech recognition on the speech to obtain a text corresponding to the speech. Then, whether the first preset word is included in the text is determined. Here, the first preset word may be a part of a wake-up word of the smart terminal device, and may be, for example, the first two words of the wake-up word. For example, the wake word is small a, and the first predetermined word may be small a.
In this embodiment, if the voice includes the first preset word, the execution main body may determine whether the context information located in the first preset word in the voice includes the second preset word. Here, the second preset word may be another part of the wake-up word, for example, the last two words of the wake-up word. It is understood that the first predetermined term and the second predetermined term may be the same or different.
In response to determining that the context information of the first preset word includes a second preset word, intent recognition is performed on the context information of the second preset word, step 204.
The execution subject may perform intent recognition on context information of a second preset word in the voice if the context information of the first preset word includes the second preset word. Specifically, the executing agent may perform intent recognition using an existing algorithm. For example, the executing body may input the following information of the second preset word in the speech into a preset intention recognition model, and output the intention recognition model as the recognized intention.
In step 205, the control device responds to the user based on the intent recognition result.
In this embodiment, the execution subject may control the device according to the intention recognition result to respond to the user. For example, if the intention recognition result is turning off the light, the execution subject may generate a turn-off command to turn off the light, thereby realizing a response to the user.
With continued reference to FIG. 3, a schematic diagram of one application scenario of the voice interaction method according to the present application is shown. In the application scenario of fig. 3, the user says "small a and small B turn off the light" to the smart speaker. After the intelligent sound box receives the voice, the voice is recognized to comprise a first preset word ' small A ', and the context information of the small A ' comprises a second preset word ' small B '. Then the following information "light off" of the second preset word "small B" is subject to intent recognition, resulting in the intent of "light off". Then, the smart speaker sends a turn-off command to the lamp to turn off the electric lamp.
The voice interaction method provided by the embodiment of the application enables the interaction process of the equipment to become more adaptive and user experience to be more friendly.
With continued reference to FIG. 4, a flow 400 of another embodiment of a voice interaction method according to the present application is shown. As shown in fig. 4, the voice interaction method of the present embodiment may include the following steps:
In step 403, in response to determining that the voice includes the first preset word, determining whether the context information located in the first preset word in the voice includes the second preset word.
In response to determining that the context information of the first preset word includes a second preset word, intent recognition is performed on the context information of the second preset word, step 404.
The principle of steps 401 to 404 is similar to that of steps 201 to 204, and is not described herein again.
In response to determining that the context information for the first preset term does not include the second preset term, intent recognition is performed on the context information for the first preset term, step 405.
In this embodiment, if the context information of the first preset word in the voice does not include the second preset word, the execution subject may perform intent recognition on the context information of the first preset word. If the context information of the first preset word does not include the second preset word, it indicates that the user likes to awaken the intelligent device by using the first two words of the preset awakening word, and the intent recognition can be performed on the context information of the first preset word.
And 406, controlling the equipment to respond to the user according to the intention recognition result.
In this embodiment, the execution subject may also determine the interaction habit of the user on the device according to the voice. Here, the interaction habit may be understood as a control instruction that is most often used when a user interacts with a device. For example, the interaction habit is a first preset word + an instruction, or a first preset word + a second preset word + a pause + an instruction.
In some optional implementation manners of this embodiment, the step 407 may be specifically implemented by the following steps not shown in fig. 4: in response to determining that the context information of the first preset word includes the second preset word, determining that a combination of the first preset word and the second preset word is a common wake-up word of the user for the device.
In this implementation manner, if it is determined that the context information of the first preset word includes the second preset word, the execution main body considers that the wake-up word used by the user is the first preset word + the second preset word. The execution subject may use a combination of the first preset word and the second preset word as a common wake-up word of the user for the device.
In some optional implementation manners of this embodiment, the step 407 may be specifically implemented by the following steps not shown in fig. 4: and in response to determining that the context information of the first preset word does not include the second preset word, determining that the first preset word is a common wake-up word of the user to the device.
In this implementation, if the execution main body determines that the context information of the first preset word does not include the second preset word, the execution main body determines that the user frequently uses the first preset word to wake up the device, and thus the first preset word may be used as a common wake up word of the device by the user.
In some optional implementation manners of this embodiment, the step 407 may be specifically implemented by the following steps not shown in fig. 4: from the speech, a dwell time after the common wake up word is determined.
In this implementation, the execution subject may also determine a dwell time after the wakeup word. If the pause time is long, the user can be responded to immediately, for example, the voice "I am" is output, in order to improve the user experience.
And step 408, outputting response information according to the interaction habit.
The execution subject may output the response information according to the interaction habit. For example, if the interaction habit is the first preset word + the second preset word + the pause + the instruction, the execution main body may immediately output the response information after the user has spoken the first preset word + the second preset word. Then, after the user speaks the instruction, the device is controlled.
According to the voice interaction method provided by the embodiment of the application, the response information can be output according to the interaction habit of the user to the equipment, the adaptability of the equipment is improved, and the user experience is improved.
With further reference to fig. 5, as an implementation of the method shown in the above-mentioned figures, the present application provides an embodiment of a voice interaction apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.
As shown in fig. 5, the voice interaction apparatus 500 of the present embodiment includes: a real-time monitoring unit 501, a voice recognition unit 502, a judgment unit 503, an intention recognition unit 504, and an apparatus control unit 505.
A real-time monitoring unit 501 configured to monitor the voice of the user in real time.
The speech recognition unit 502 is configured to recognize the speech and determine whether the speech includes a first preset word.
The determining unit 503 is configured to determine whether context information located in the first preset word in the speech includes a second preset word in response to determining that the speech includes the first preset word.
An intent recognition unit 504 configured to perform intent recognition on context information of a second preset word in response to determining that the context information of the first preset word includes the second preset word.
A device control unit 505 configured to control the device to respond to the user according to the intention recognition result.
In some optional implementations of the present embodiment, the intent recognition unit 504 may be further configured to: in response to determining that the context information of the first preset word does not include the second preset word, intent recognition is performed on the context information of the first preset word.
In some optional implementations of this embodiment, the apparatus 500 may further include an interaction habit determining unit, not shown in fig. 5, configured to: determining the interaction habit of a user to the equipment according to the voice; and outputting response information according to the interaction habit.
In some optional implementations of the present embodiment, the interaction habit determination unit is further configured to: in response to determining that the context information of the first preset word includes the second preset word, determining that a combination of the first preset word and the second preset word is a common wake-up word of the user for the device.
In some optional implementations of the present embodiment, the interaction habit determination unit is further configured to: and in response to determining that the context information of the first preset word does not include the second preset word, determining that the first preset word is a common wake-up word of the user to the device.
In some optional implementations of the present embodiment, the interaction habit determination unit is further configured to: from the speech, a dwell time after the common wake up word is determined.
It should be understood that units 501 to 505 recited in the voice interaction apparatus 500 correspond to respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the voice interaction method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 6 is a block diagram of an electronic device for performing a voice interaction method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of performing voice interaction provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the method of performing voice interaction provided by the present application.
The memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for performing voice interaction in the embodiment of the present application (for example, the real-time monitoring unit 501, the voice recognition unit 502, the determination unit 503, the intention recognition unit 504, and the device control unit 505 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implementing the method of performing voice interaction in the above-described method embodiments.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device performing voice interaction, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, which may be connected through a network to an electronic device performing voice interactions. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device performing the voice interaction method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to performing user settings and function control of the voice interactive electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the technical problem that the existing terminal equipment awakening method cannot well adapt to awakening habits of different users is solved, so that the adaptability of the interaction process of the equipment becomes stronger, and the user experience is more friendly.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (14)
1. A voice interaction method, comprising:
monitoring the voice of a user in real time;
recognizing the voice, and determining whether the voice comprises a first preset word or not;
in response to determining that the voice includes a first preset word, determining whether context information located in the first preset word in the voice includes a second preset word;
in response to determining that the context information of the first preset word includes a second preset word, performing intent recognition on the context information of the second preset word;
and controlling the equipment to respond to the user according to the intention recognition result.
2. The method of claim 1, wherein the method further comprises:
in response to determining that the contextual information of the first preset term does not include a second preset term, intent recognition is performed on the contextual information of the first preset term.
3. The method of claim 2, wherein the method further comprises:
determining the interaction habit of the user and the equipment according to the voice;
and outputting response information according to the interaction habit.
4. The method of claim 3, wherein said determining interaction habits of the user with the device from the speech comprises:
in response to determining that the context information of the first preset word includes the second preset word, determining that a combination of the first preset word and the second preset word is a common wake-up word of the user for the device.
5. The method of claim 3, wherein said determining interaction habits of the user with the device from the speech comprises:
in response to determining that the context information of the first preset word does not include a second preset word, determining that the first preset word is a common wake-up word of the user for the device.
6. The method of claim 4 or 5, wherein the determining interaction habits of the user on the device from the speech comprises:
and determining the pause time after the common awakening word according to the voice.
7. A voice interaction device, comprising:
a real-time monitoring unit configured to monitor a voice of a user in real time;
a voice recognition unit configured as a voice recognition unit configured to recognize the voice and determine whether the voice includes a first preset word;
a determining unit configured to determine whether context information located in a first preset word in the voice includes a second preset word in response to determining that the voice includes the first preset word;
an intention recognition unit configured to perform intention recognition on context information of a second preset word in response to a determination that the context information of the first preset word includes the second preset word;
and a device control unit configured to control the device to respond to the user according to the intention recognition result.
8. The apparatus of claim 7, wherein the intent recognition unit is further configured to:
in response to determining that the contextual information of the first preset term does not include a second preset term, intent recognition is performed on the contextual information of the first preset term.
9. The apparatus of claim 7, wherein the apparatus further comprises an interaction habit determination unit configured to:
determining the interaction habit of the user on the equipment according to the voice;
and outputting response information according to the interaction habit.
10. The apparatus of claim 9, wherein the interaction habit determination unit is further configured to:
in response to determining that the context information of the first preset word includes the second preset word, determining that a combination of the first preset word and the second preset word is a common wake-up word of the user for the device.
11. The apparatus of claim 9, wherein the interaction habit determination unit is further configured to:
in response to determining that the context information of the first preset word does not include a second preset word, determining that the first preset word is a common wake-up word of the user for the device.
12. The apparatus according to claim 10 or 11, wherein the interaction habit determination unit is further configured to:
and determining the pause time after the common awakening word according to the voice.
13. A voice interactive electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010896268.3A CN112037786B (en) | 2020-08-31 | 2020-08-31 | Voice interaction method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010896268.3A CN112037786B (en) | 2020-08-31 | 2020-08-31 | Voice interaction method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112037786A true CN112037786A (en) | 2020-12-04 |
CN112037786B CN112037786B (en) | 2024-09-24 |
Family
ID=73586455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010896268.3A Active CN112037786B (en) | 2020-08-31 | 2020-08-31 | Voice interaction method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112037786B (en) |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714815A (en) * | 2013-12-09 | 2014-04-09 | 何永 | Voice control method and device thereof |
CN108132805A (en) * | 2017-12-20 | 2018-06-08 | 深圳Tcl新技术有限公司 | Voice interactive method, device and computer readable storage medium |
CN108986801A (en) * | 2017-06-02 | 2018-12-11 | 腾讯科技(深圳)有限公司 | A kind of man-machine interaction method, device and human-computer interaction terminal |
CN109147779A (en) * | 2018-08-14 | 2019-01-04 | 苏州思必驰信息科技有限公司 | Voice data processing method and device |
CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
CN109410952A (en) * | 2018-10-26 | 2019-03-01 | 北京蓦然认知科技有限公司 | A kind of voice awakening method, apparatus and system |
CN109509470A (en) * | 2018-12-11 | 2019-03-22 | 平安科技(深圳)有限公司 | Voice interactive method, device, computer readable storage medium and terminal device |
CN109584878A (en) * | 2019-01-14 | 2019-04-05 | 广东小天才科技有限公司 | Voice awakening method and system |
JP2019109510A (en) * | 2017-12-18 | 2019-07-04 | ネイバー コーポレーションNAVER Corporation | Method and system for controlling artificial intelligence device using plural wake words |
CN109994106A (en) * | 2017-12-29 | 2019-07-09 | 阿里巴巴集团控股有限公司 | A kind of method of speech processing and equipment |
CN110299137A (en) * | 2018-03-22 | 2019-10-01 | 腾讯科技(深圳)有限公司 | Voice interactive method and device |
CN110534102A (en) * | 2019-09-19 | 2019-12-03 | 北京声智科技有限公司 | A kind of voice awakening method, device, equipment and medium |
CN110634468A (en) * | 2019-09-11 | 2019-12-31 | 中国联合网络通信集团有限公司 | Voice wake-up method, device, equipment and computer readable storage medium |
US20200105274A1 (en) * | 2018-09-27 | 2020-04-02 | Snackable Inc. | Audio content processing systems and methods |
CN111063356A (en) * | 2018-10-17 | 2020-04-24 | 北京京东尚科信息技术有限公司 | Electronic equipment response method and system, sound box and computer readable storage medium |
CN111261160A (en) * | 2020-01-20 | 2020-06-09 | 联想(北京)有限公司 | Signal processing method and device |
US20200258513A1 (en) * | 2019-02-08 | 2020-08-13 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing |
-
2020
- 2020-08-31 CN CN202010896268.3A patent/CN112037786B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714815A (en) * | 2013-12-09 | 2014-04-09 | 何永 | Voice control method and device thereof |
CN108986801A (en) * | 2017-06-02 | 2018-12-11 | 腾讯科技(深圳)有限公司 | A kind of man-machine interaction method, device and human-computer interaction terminal |
JP2019109510A (en) * | 2017-12-18 | 2019-07-04 | ネイバー コーポレーションNAVER Corporation | Method and system for controlling artificial intelligence device using plural wake words |
CN108132805A (en) * | 2017-12-20 | 2018-06-08 | 深圳Tcl新技术有限公司 | Voice interactive method, device and computer readable storage medium |
CN109994106A (en) * | 2017-12-29 | 2019-07-09 | 阿里巴巴集团控股有限公司 | A kind of method of speech processing and equipment |
CN110299137A (en) * | 2018-03-22 | 2019-10-01 | 腾讯科技(深圳)有限公司 | Voice interactive method and device |
CN109147779A (en) * | 2018-08-14 | 2019-01-04 | 苏州思必驰信息科技有限公司 | Voice data processing method and device |
US20200105274A1 (en) * | 2018-09-27 | 2020-04-02 | Snackable Inc. | Audio content processing systems and methods |
CN111063356A (en) * | 2018-10-17 | 2020-04-24 | 北京京东尚科信息技术有限公司 | Electronic equipment response method and system, sound box and computer readable storage medium |
CN109410952A (en) * | 2018-10-26 | 2019-03-01 | 北京蓦然认知科技有限公司 | A kind of voice awakening method, apparatus and system |
CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
CN109509470A (en) * | 2018-12-11 | 2019-03-22 | 平安科技(深圳)有限公司 | Voice interactive method, device, computer readable storage medium and terminal device |
CN109584878A (en) * | 2019-01-14 | 2019-04-05 | 广东小天才科技有限公司 | Voice awakening method and system |
US20200258513A1 (en) * | 2019-02-08 | 2020-08-13 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing |
CN110634468A (en) * | 2019-09-11 | 2019-12-31 | 中国联合网络通信集团有限公司 | Voice wake-up method, device, equipment and computer readable storage medium |
CN110534102A (en) * | 2019-09-19 | 2019-12-03 | 北京声智科技有限公司 | A kind of voice awakening method, device, equipment and medium |
CN111261160A (en) * | 2020-01-20 | 2020-06-09 | 联想(北京)有限公司 | Signal processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112037786B (en) | 2024-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111428008B (en) | Method, apparatus, device and storage medium for training a model | |
CN111192591B (en) | Awakening method and device of intelligent equipment, intelligent sound box and storage medium | |
CN111680517B (en) | Method, apparatus, device and storage medium for training model | |
CN111640426A (en) | Method and apparatus for outputting information | |
CN112382294B (en) | Speech recognition method, device, electronic equipment and storage medium | |
CN112382279B (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN111862940A (en) | Earphone-based translation method, device, system, equipment and storage medium | |
CN112530419B (en) | Speech recognition control method, device, electronic equipment and readable storage medium | |
CN111443801B (en) | Man-machine interaction method, device, equipment and storage medium | |
CN111966212A (en) | Multi-mode-based interaction method and device, storage medium and smart screen device | |
CN111309283A (en) | Voice control method and device for user interface, electronic equipment and storage medium | |
CN112634890B (en) | Method, device, equipment and storage medium for waking up playing equipment | |
JP7257434B2 (en) | Voice interaction method, voice interaction device, electronic device, storage medium and computer program product | |
CN111681647A (en) | Method, apparatus, device and storage medium for recognizing word slot | |
CN111709252A (en) | Model improvement method and device based on pre-trained semantic model | |
CN112133307A (en) | Man-machine interaction method and device, electronic equipment and storage medium | |
CN111883127A (en) | Method and apparatus for processing speech | |
CN112466296A (en) | Voice interaction processing method and device, electronic equipment and storage medium | |
CN112652304B (en) | Voice interaction method and device of intelligent equipment and electronic equipment | |
CN112382292A (en) | Voice-based control method and device | |
CN111986682A (en) | Voice interaction method, device, equipment and storage medium | |
CN112037794A (en) | Voice interaction method, device, equipment and storage medium | |
CN111192581A (en) | Voice wake-up method, device and storage medium | |
CN112037786B (en) | Voice interaction method, device, equipment and storage medium | |
CN114861675A (en) | Method and device for semantic recognition and method and device for generating control instruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210517 Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Applicant after: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd. Applicant after: Shanghai Xiaodu Technology Co.,Ltd. Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |