CN112185374A - Method and device for determining voice intention - Google Patents
Method and device for determining voice intention Download PDFInfo
- Publication number
- CN112185374A CN112185374A CN202010929640.6A CN202010929640A CN112185374A CN 112185374 A CN112185374 A CN 112185374A CN 202010929640 A CN202010929640 A CN 202010929640A CN 112185374 A CN112185374 A CN 112185374A
- Authority
- CN
- China
- Prior art keywords
- voice
- intention
- context
- template
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000004458 analytical method Methods 0.000 claims description 13
- 238000013077 scoring method Methods 0.000 claims 2
- 230000009286 beneficial effect Effects 0.000 abstract description 8
- 238000010586 diagram Methods 0.000 description 13
- 238000004590 computer program Methods 0.000 description 7
- 230000000875 corresponding effect Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method and a device for determining a voice intention, which are used for obtaining a more accurate voice intention and are beneficial to realizing accurate voice control. The method comprises the following steps: obtaining an input voice; obtaining a context associated with the speech; matching the voice content and the scene context of the voice with a preset intention template; and determining the voice intention according to the matched and consistent intention template.
Description
Technical Field
The present invention relates to the field of computer and communication technologies, and in particular, to a method and an apparatus for determining a voice intention.
Background
Artificial intelligence technology is one of the important technical fields of current research. Image technology and voice technology are two important basic technologies for artificial intelligence. Among them, how to more accurately understand the speaking intention of a speaker is an important research direction of speech technology. One common processing method is to convert the user's voice into text, and then perform sentence structure analysis on the text, thereby determining the voice intention. The speech intent obtained in this way is not sufficiently accurate.
Disclosure of Invention
The invention provides a method and a device for determining a voice intention, which are used for obtaining a more accurate voice intention and are beneficial to realizing accurate voice control.
The invention provides a method for determining a voice intention, which comprises the following steps:
obtaining an input voice;
obtaining a context associated with the speech;
matching the voice content and the scene context of the voice with a preset intention template;
and determining the voice intention according to the matched and consistent intention template.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: the embodiment combines the contextual information except the voice content, so that the voice intention obtained by analysis is more accurate.
Optionally, the context includes at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: the embodiment acquires various scene information aiming at the scene where the user providing the voice is located so as to analyze the voice intention of the user from more dimensions.
Optionally, the method further comprises at least one of:
carrying out acoustic analysis on the voice to obtain acoustic characteristic information corresponding to voice content;
obtaining a dialog context for the speech;
matching the voice with historical voice of a user providing the voice to obtain historical information;
the matching of the voice content of the voice and the contextual context with a preset intention template comprises:
matching the voice content of the voice and the scene context with at least one of the following information and a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: the embodiment further combines the acoustic characteristics of the voice, the context of the conversation and the historical habits of the user to more accurately analyze the voice intention of the user.
Optionally, when there are at least two intent templates that match identically, the method further includes:
scoring the at least two intention templates according to a plurality of preset scoring modes and the priority of each scoring mode;
determining the voice intention according to the intention template which is matched with the voice intention template, wherein the method comprises the following steps:
and when the highest-grade intention template is obtained, determining the voice intention according to the highest-grade intention template.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: when a plurality of intention templates are matched, the embodiment can adopt a plurality of scoring modes to select a better intention template, so as to more accurately determine the voice intention.
Optionally, the multiple scoring modes include, in order of priority from high to low: a template scoring mode, a lexical scoring mode and a syntactic scoring mode.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: the embodiment provides multiple and ordered scoring modes, and is a multi-angle scoring mode.
The invention provides a device for determining voice intention, which comprises:
the voice module is used for obtaining input voice;
a context module to obtain a context associated with the speech;
the matching module is used for matching the voice content and the scene context of the voice with a preset intention template;
and the intention module is used for determining the voice intention according to the matched and consistent intention template.
Optionally, the context includes at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
Optionally, the apparatus further comprises at least one of:
the acoustic module is used for carrying out acoustic analysis on the voice to obtain acoustic characteristic information corresponding to voice content;
the dialogue module is used for obtaining dialogue context of the voice;
the history module is used for matching the voice with the history voice of the user providing the voice to obtain history information;
the matching module includes:
the matching sub-module is used for matching the voice content of the voice, the scene context and at least one of the following information with a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
Optionally, when there are at least two intent templates that match identically, the apparatus further includes:
the scoring module is used for scoring the at least two intention templates according to a plurality of preset scoring modes and the priority of each scoring mode;
the intent module includes:
and the intention submodule is used for determining the voice intention according to the highest-grade intention template when the highest-grade intention template is obtained.
Optionally, the multiple scoring modes include, in order of priority from high to low: a template scoring mode, a lexical scoring mode and a syntactic scoring mode.
The invention provides a device for determining voice intention, which comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
obtaining an input voice;
obtaining a context associated with the speech;
matching the voice content and the scene context of the voice with a preset intention template;
and determining the voice intention according to the matched and consistent intention template.
The present invention provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for determining intent to speak in an embodiment of the present invention;
FIG. 2 is a flow chart of a method for determining intent to speak in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of a method for determining intent to speak in accordance with an embodiment of the present invention;
FIG. 4 is a block diagram of an apparatus for determining intent to speak in accordance with an embodiment of the present invention;
FIG. 5 is a block diagram of an apparatus for determining intent to speak in accordance with an embodiment of the present invention;
FIG. 6 is a block diagram of a matching module in an embodiment of the invention;
FIG. 7 is a block diagram of an apparatus for determining intent to speak in accordance with an embodiment of the present invention;
FIG. 8 is a block diagram of an intent module in an embodiment of the invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
In the related art, how to more accurately understand the speaking intention of a speaker is an important research direction of the speech technology. One common processing method is to convert the user's voice into text, and then perform sentence structure analysis on the text, thereby determining the voice intention. The speech intent obtained in this way is not sufficiently accurate.
In order to solve the above problem, the embodiment combines the context except the voice content, so as to more accurately analyze the voice intention of the user.
Referring to fig. 1, the method for determining a voice intention in the present embodiment includes:
step 101: an input speech is obtained.
Step 102: a contextual context associated with the speech is obtained.
Step 103: and matching the voice content of the voice and the scene context with a preset intention template.
Step 104: and determining the voice intention according to the matched and consistent intention template.
If no matching intention template exists, the process is ended, and a notification indicating that the voice recognition is failed can be fed back to the user.
The execution main body of this embodiment may be a central control device in a home of a user, and the received voice may be a wake-up voice to wake up an application module, or may be a command voice to input a control command to an application module. The voice-controlled application module can be an application module in the central control device, and can also be an application module in the intelligent device which has a network connection relationship with the central control device.
On the basis of the voice content, the embodiment adds the context, namely the context information of the current situation of the user is added, so that a more appropriate intention template is matched, and the voice intention of the user is determined more accurately.
Optionally, the context includes at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
The context information included in the context is complementary to the voice content. For example, the voice content is opening a door. The application module currently acquiring the application state information is a video call application of the unit access control. The user's voice intention is to open the cell door, not the door at the user's home.
The location of the user includes a geographic location, a home location, and the like. The geographic location may be latitude and longitude coordinates, or province, city, district, street. A home location such as a bedroom, living room, etc. The position can be described from multiple angles.
The environmental information includes temperature information, weather information, and the like.
The user portrait information includes: age, gender, occupation, family role (e.g., father), etc.
The application module may be a local application module, and when a certain function of the application module is triggered, the application module sends the current application state information to the speech processing front end through the operating system. When the application module is an application module in an external intelligent device, for example, the application module is a local central control device in a home, and the application module is an intelligent device such as an alarm clock, an entrance guard, a sound box and the like. When a certain function of the application module is triggered, the application module sends the current application state information to the voice processing front end through the network. The application state information includes a sleep state or an active state, etc.
Optionally, the method further comprises at least one of: step a 1-step A3.
Step A1: and carrying out acoustic analysis on the voice to obtain acoustic characteristic information corresponding to the voice content.
Step A2: a dialog context for the speech is obtained.
Step A3: and matching the voice with the historical voice of the user providing the voice to obtain historical information.
The step 103 comprises: step a 4.
Step A4: matching the voice content of the voice and the scene context with at least one of the following information and a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
The acoustic feature information in the present embodiment includes a speech rate, an accent, and the like. When a person speaks, the key words can be emphasized and slowed down. This feature is helpful for analyzing user intent.
The dialog context includes successive pieces of speech content of the same user. In a round of conversation, the front and back utterances of a user are often strongly correlated, and the intention analysis can be assisted.
The historical information can reflect the language habits, living habits and the like of the user. The currently received voice can be matched with the historical voice of the user, and when the matching is consistent, the historical information is obtained from the historical voice which is consistent in matching. Wherein the similarity threshold when the matches may be set lower. And, when matching with the historical speech, the historical speech at the same point in time can be matched. For example, the current time is 7 am, the received voice is "on light", the voice is matched with the historical voice at 7 am in the past 5 days (within the preset time period), the historical voice is "on light", the result is consistent, and the historical information includes "toilet".
The present embodiment also supplements the acoustic feature information, the dialog context, and the history information on the basis of the speech content and the context. The method and the device are beneficial to more accurately matching the intention template to the intention template with higher quality, and further more accurate analysis can be carried out to obtain the voice intention of the user.
Optionally, when there are at least two intent templates that match identically, the method further includes: step B1.
Step B1: and scoring the at least two intention templates according to a plurality of preset scoring modes and the priority of each scoring mode.
The step 104 comprises: step B2.
Step B2: and when the highest-grade intention template is obtained, determining the voice intention according to the highest-grade intention template.
In this embodiment, the intention template includes a slot position and a value of the slot position, and there may be a plurality of slot positions. For example, the intent template is shaped as: { action: opening, room: bedroom, equipment: desk lamp, the action, room and equipment are the slot positions, and the opening, bedroom and desk lamp are the corresponding values. The received speech is "turn on bedroom lights", intention template 1{ action: opening, the device: lamp, intention template 2{ action: opening, the device: desk lamp, intention template 3{ action: opening, room: bedroom, equipment: desk lamp }. The speech and the intent templates 1-3 may all match. At this point, a higher quality intent template needs to be selected to more accurately determine the intent of the speech.
The embodiment adopts a plurality of scoring modes, and adopts preset scoring modes in sequence from high priority to low priority. And when a plurality of intention templates with the highest scores are obtained by adopting a certain scoring mode, continuing scoring by adopting a next-level scoring mode. And when one intention template with the highest score is obtained by adopting a certain scoring mode, ending the step and not adopting the subsequent scoring mode to continue scoring. The embodiment can obtain the intention template with the highest quality in a scoring mode, and further obtain a more accurate voice intention.
Optionally, the multiple scoring modes include, in order of priority from high to low: a template scoring mode, a lexical scoring mode and a syntactic scoring mode.
In this embodiment, the template scoring mode is to score all intention templates in advance, and after the matching intention templates are determined, the scores of the intention templates can be obtained.
The lexical scoring mode is that the number and quality of slots matched in the intent templates with consistent matching are analyzed and scored. The greater the number of slots matched, the higher the score. The higher the quality of the matched slot position is, the higher the score is, and the more the real intention of the user can be reflected. The quality is embodied on the part of speech (verb, noun, etc.) of the slot, and the more diversified the part of speech, the higher the quality, and the more the real intention of the user can be embodied.
The syntactic scoring mode is that the matched and consistent intention template has a more complex syntactic structure, so that the intention template can reflect the intention of the user more completely. For example: if the intention template can extract complex syntactic structures such as antisense and question reversal, the quality is higher and the score is higher.
The implementation is described in detail below by way of several embodiments.
Referring to fig. 2, the method for determining a voice intention in the present embodiment includes:
step 201: an input speech is obtained.
Step 202: a contextual context associated with the speech is obtained. The context includes at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
Step 203: and carrying out acoustic analysis on the voice to obtain acoustic characteristic information corresponding to the voice content.
Step 204: a dialog context for the speech is obtained.
Step 205: and matching the voice with the historical voice of the user providing the voice to obtain historical information.
The steps 202 to 205 are relatively independent steps, and the execution sequence may be interchanged or performed synchronously.
Step 206: matching the voice content of the voice and the scene context with at least one of the following information and a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
Step 207: and determining the voice intention according to the matched and consistent intention template.
Referring to fig. 3, the method for determining a voice intention in the present embodiment includes:
step 301: an input speech is obtained.
Step 302: a contextual context associated with the speech is obtained. The context includes at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
Step 303: and carrying out acoustic analysis on the voice to obtain acoustic characteristic information corresponding to the voice content.
Step 304: a dialog context for the speech is obtained.
Step 305: and matching the voice with the historical voice of the user providing the voice to obtain historical information.
The steps 302 to 305 are relatively independent steps, and the execution sequence can be interchanged or can be performed synchronously.
Step 306: matching the voice content of the voice and the scene context with at least one of the following information and a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
Step 307: when at least two matched intention templates exist, scoring is carried out on the at least two intention templates according to multiple preset scoring modes and the priority of each scoring mode.
Step 308: and when the highest-grade intention template is obtained, determining the voice intention according to the highest-grade intention template.
The above embodiments can be freely combined according to actual needs.
The implementation of determining the intent of speech, which can be implemented by the device, is described above, and the internal structure and function of the device are described below.
Referring to fig. 4, the apparatus for determining a voice intention in the present embodiment includes: a speech module 401, a context module 402, a matching module 403 and an intent module 404.
A voice module 401, configured to obtain an input voice.
A context module 402 configured to obtain a context associated with the speech.
A matching module 403, configured to match the voice content of the voice and the context with a preset intention template.
An intent module 404 for determining a speech intent based on the matching intent templates.
Optionally, the context includes at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
Optionally, as shown in fig. 5, the apparatus further includes at least one of the following: an acoustic module 501, a dialogue module 502, and a history module 503.
The acoustic module 501 is configured to perform acoustic analysis on the speech to obtain acoustic feature information corresponding to speech content.
A dialog module 502 for obtaining a dialog context for the speech.
A history module 503, configured to match the voice with a history voice of a user providing the voice, so as to obtain history information.
As shown in fig. 6, the matching module 403 includes: a matching sub-module 601.
A matching sub-module 601, configured to match the speech content of the speech and the context, and combine at least one of the following information with a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
Optionally, as shown in fig. 7, when there are at least two matching intention templates, the apparatus further includes: a scoring module 701.
The scoring module 701 is configured to score the at least two intention templates according to a plurality of preset scoring manners and priorities of the scoring manners.
As shown in fig. 8, the intent module 404 includes: intention submodule 801.
The intention submodule 801 is configured to, when a highest-score intention template is obtained, determine a speech intention according to the highest-score intention template.
Optionally, the multiple scoring modes include, in order of priority from high to low: a template scoring mode, a lexical scoring mode and a syntactic scoring mode.
An apparatus to determine a speech intent, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
obtaining an input voice;
obtaining a context associated with the speech;
matching the voice content and the scene context of the voice with a preset intention template;
and determining the voice intention according to the matched and consistent intention template.
A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (12)
1. A method of determining a speech intent, comprising:
obtaining an input voice;
obtaining a context associated with the speech;
matching the voice content and the scene context of the voice with a preset intention template;
and determining the voice intention according to the matched and consistent intention template.
2. The method of claim 1, wherein the context comprises at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
3. The method of claim 1, wherein the method further comprises at least one of:
carrying out acoustic analysis on the voice to obtain acoustic characteristic information corresponding to voice content;
obtaining a dialog context for the speech;
matching the voice with historical voice of a user providing the voice to obtain historical information;
the matching of the voice content of the voice and the contextual context with a preset intention template comprises:
matching the voice content of the voice and the scene context with at least one of the following information and a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
4. The method of claim 1, wherein when there are at least two intent templates that match identically, the method further comprises:
scoring the at least two intention templates according to a plurality of preset scoring modes and the priority of each scoring mode;
determining the voice intention according to the intention template which is matched with the voice intention template, wherein the method comprises the following steps:
and when the highest-grade intention template is obtained, determining the voice intention according to the highest-grade intention template.
5. The method of claim 4, wherein the plurality of scoring methods comprises, in order of priority: a template scoring mode, a lexical scoring mode and a syntactic scoring mode.
6. An apparatus for determining intent of speech, comprising:
the voice module is used for obtaining input voice;
a context module to obtain a context associated with the speech;
the matching module is used for matching the voice content and the scene context of the voice with a preset intention template;
and the intention module is used for determining the voice intention according to the matched and consistent intention template.
7. The apparatus of claim 6, wherein the context comprises at least one of: the application module is used for obtaining the time of the voice, the position of the user providing the voice, the environment information of the environment where the user is located, the user image information of the user and the application state information.
8. The apparatus of claim 6, wherein the apparatus further comprises at least one of:
the acoustic module is used for carrying out acoustic analysis on the voice to obtain acoustic characteristic information corresponding to voice content;
the dialogue module is used for obtaining dialogue context of the voice;
the history module is used for matching the voice with the history voice of the user providing the voice to obtain history information;
the matching module includes:
the matching sub-module is used for matching the voice content of the voice, the scene context and at least one of the following information with a preset intention template; wherein the following information includes: the acoustic feature information, the dialog context, and the history information.
9. The apparatus of claim 6, wherein when there are at least two intent templates that match identically, the apparatus further comprises:
the scoring module is used for scoring the at least two intention templates according to a plurality of preset scoring modes and the priority of each scoring mode;
the intent module includes:
and the intention submodule is used for determining the voice intention according to the highest-grade intention template when the highest-grade intention template is obtained.
10. The apparatus of claim 9, wherein the plurality of scoring methods comprises, in order of priority: a template scoring mode, a lexical scoring mode and a syntactic scoring mode.
11. An apparatus for determining intent of speech, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
obtaining an input voice;
obtaining a context associated with the speech;
matching the voice content and the scene context of the voice with a preset intention template;
and determining the voice intention according to the matched and consistent intention template.
12. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the steps of the method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010929640.6A CN112185374A (en) | 2020-09-07 | 2020-09-07 | Method and device for determining voice intention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010929640.6A CN112185374A (en) | 2020-09-07 | 2020-09-07 | Method and device for determining voice intention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112185374A true CN112185374A (en) | 2021-01-05 |
Family
ID=73925647
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010929640.6A Pending CN112185374A (en) | 2020-09-07 | 2020-09-07 | Method and device for determining voice intention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112185374A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115576216A (en) * | 2022-12-09 | 2023-01-06 | 深圳市人马互动科技有限公司 | Information filling method and device based on voice control intelligent household appliance |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845624A (en) * | 2016-12-16 | 2017-06-13 | 北京光年无限科技有限公司 | The multi-modal exchange method relevant with the application program of intelligent robot and system |
CN107316635A (en) * | 2017-05-19 | 2017-11-03 | 科大讯飞股份有限公司 | Audio recognition method and device, storage medium, electronic equipment |
CN107633844A (en) * | 2017-10-10 | 2018-01-26 | 杭州嘉楠耘智信息科技股份有限公司 | Apparatus control method and device |
CN109326289A (en) * | 2018-11-30 | 2019-02-12 | 深圳创维数字技术有限公司 | Exempt to wake up voice interactive method, device, equipment and storage medium |
CN109918673A (en) * | 2019-03-14 | 2019-06-21 | 湖北亿咖通科技有限公司 | Semantic referee method, device, electronic equipment and computer readable storage medium |
CN110705267A (en) * | 2019-09-29 | 2020-01-17 | 百度在线网络技术(北京)有限公司 | Semantic parsing method, semantic parsing device and storage medium |
CN110990685A (en) * | 2019-10-12 | 2020-04-10 | 中国平安财产保险股份有限公司 | Voice search method, voice search device, voice search storage medium and voice search device based on voiceprint |
CN111261146A (en) * | 2020-01-16 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Speech recognition and model training method, device and computer readable storage medium |
CN111508482A (en) * | 2019-01-11 | 2020-08-07 | 阿里巴巴集团控股有限公司 | Semantic understanding and voice interaction method, device, equipment and storage medium |
-
2020
- 2020-09-07 CN CN202010929640.6A patent/CN112185374A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845624A (en) * | 2016-12-16 | 2017-06-13 | 北京光年无限科技有限公司 | The multi-modal exchange method relevant with the application program of intelligent robot and system |
CN107316635A (en) * | 2017-05-19 | 2017-11-03 | 科大讯飞股份有限公司 | Audio recognition method and device, storage medium, electronic equipment |
CN107633844A (en) * | 2017-10-10 | 2018-01-26 | 杭州嘉楠耘智信息科技股份有限公司 | Apparatus control method and device |
CN109326289A (en) * | 2018-11-30 | 2019-02-12 | 深圳创维数字技术有限公司 | Exempt to wake up voice interactive method, device, equipment and storage medium |
CN111508482A (en) * | 2019-01-11 | 2020-08-07 | 阿里巴巴集团控股有限公司 | Semantic understanding and voice interaction method, device, equipment and storage medium |
CN109918673A (en) * | 2019-03-14 | 2019-06-21 | 湖北亿咖通科技有限公司 | Semantic referee method, device, electronic equipment and computer readable storage medium |
CN110705267A (en) * | 2019-09-29 | 2020-01-17 | 百度在线网络技术(北京)有限公司 | Semantic parsing method, semantic parsing device and storage medium |
CN110990685A (en) * | 2019-10-12 | 2020-04-10 | 中国平安财产保险股份有限公司 | Voice search method, voice search device, voice search storage medium and voice search device based on voiceprint |
CN111261146A (en) * | 2020-01-16 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Speech recognition and model training method, device and computer readable storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115576216A (en) * | 2022-12-09 | 2023-01-06 | 深圳市人马互动科技有限公司 | Information filling method and device based on voice control intelligent household appliance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11854545B2 (en) | Privacy mode based on speaker identifier | |
US11915699B2 (en) | Account association with device | |
US11580991B2 (en) | Speaker based anaphora resolution | |
US11862174B2 (en) | Voice command processing for locked devices | |
US10678504B1 (en) | Maintaining context for voice processes | |
US11823678B2 (en) | Proactive command framework | |
US11763808B2 (en) | Temporary account association with voice-enabled devices | |
US10714085B2 (en) | Temporary account association with voice-enabled devices | |
WO2020231522A1 (en) | Using context information with end-to-end models for speech recognition | |
US11551684B1 (en) | State detection and responses for electronic devices | |
US11276403B2 (en) | Natural language speech processing application selection | |
CN112397053B (en) | Voice recognition method and device, electronic equipment and readable storage medium | |
CN112669842A (en) | Man-machine conversation control method, device, computer equipment and storage medium | |
US10861453B1 (en) | Resource scheduling with voice controlled devices | |
CN113611316A (en) | Man-machine interaction method, device, equipment and storage medium | |
US11948562B1 (en) | Predictive feature analysis | |
CN112185374A (en) | Method and device for determining voice intention | |
CN117882131A (en) | Multiple wake word detection | |
EP3776300A1 (en) | Temporary account association with voice-enabled devices | |
CN118366454A (en) | Audio data processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |