CN116612751A - Intention recognition method, device, electronic equipment and storage medium - Google Patents
Intention recognition method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116612751A CN116612751A CN202210118122.5A CN202210118122A CN116612751A CN 116612751 A CN116612751 A CN 116612751A CN 202210118122 A CN202210118122 A CN 202210118122A CN 116612751 A CN116612751 A CN 116612751A
- Authority
- CN
- China
- Prior art keywords
- intention
- voice
- recognized
- determining
- intent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 239000013598 vector Substances 0.000 claims abstract description 57
- 238000013145 classification model Methods 0.000 claims abstract description 21
- 238000012545 processing Methods 0.000 claims description 30
- 230000011218 segmentation Effects 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 12
- 238000012937 correction Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 13
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000000575 pesticide Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
The application discloses an intention recognition method, an intention recognition device, electronic equipment and a storage medium, wherein the intention recognition method comprises the following steps: extracting features of the voice to be recognized to obtain feature vectors; determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and probability values corresponding to each intention; inputting the intention set into a classification model to obtain a relevant value of each intention; and determining the intention corresponding to the voice to be recognized according to the obtained correlation value. In the process of intention recognition, accuracy of intention recognition is improved, and expandability of intention recognition is improved.
Description
Technical Field
The present application relates to the field of intent recognition technologies, and in particular, to an intent recognition method, apparatus, electronic device, and storage medium.
Background
In the dialogue system, the semantic understanding system is responsible for natural language understanding tasks, mainly processing sentences input by a user and extracting dialogue intentions of the user. The common semantic understanding system predicts a corresponding field according to the input voice, then extracts key information through the key words, and further realizes intention recognition according to the obtained key information.
However, when the intention recognition is performed according to the above-described manner, since the domain to which the voice belongs is determined in advance and then the intention recognition is performed in the domain, the domain with the same high probability is ignored, and an error occurs in the intention recognition. Meanwhile, when the system is expanded, the whole optimization and adjustment are needed, and the system has no expandability.
Therefore, there is a need for an intention recognition method that improves the accuracy and expandability of intention recognition.
Disclosure of Invention
The embodiment of the application aims to provide an intention recognition method, an intention recognition device, electronic equipment and a storage medium, which are used for improving the accuracy of intention recognition and the expandability of intention recognition.
In a first aspect, to achieve the above object, an embodiment of the present application provides an intent recognition method, including:
extracting features of the voice to be recognized to obtain feature vectors;
determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and probability values corresponding to each intention;
inputting the intention set into a classification model to obtain a relevant value of each intention;
and determining the intention corresponding to the voice to be recognized according to the obtained correlation value.
In a second aspect, in order to solve the same technical problem, an embodiment of the present application provides an intention recognition apparatus, including:
the feature extraction module is used for extracting features of the voice to be recognized to obtain feature vectors;
the intention prediction module is used for determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and a probability value corresponding to each intention;
the association determining module is used for inputting the intention set into the classification model to obtain a relevant value of each intention;
and the intention determining module is used for determining the intention corresponding to the voice to be recognized according to the obtained related value.
In a third aspect, to solve the same technical problem, an embodiment of the present application provides an electronic device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the memory being coupled to the processor, and the processor implementing the steps in the intention recognition method described in any one of the above when the processor executes the computer program.
In a fourth aspect, in order to solve the same technical problem, an embodiment of the present application provides a computer readable storage medium storing a computer program, where a device where the computer readable storage medium is controlled to execute the steps in any one of the above-described intention recognition methods when the computer program runs.
The embodiment of the application provides an intention recognition method, an intention recognition device, electronic equipment and a storage medium, wherein when intention recognition is carried out, text conversion is carried out on received voice information, error correction and correction processing are carried out, then feature vectors under different intention domains are obtained through feature extraction, then intention recognition is carried out in different intention domains, the intention corresponding to each intention domain is obtained, further the intention is processed in a classification model according to each obtained intention, a relevant value corresponding to each intention is obtained, and finally the intention corresponding to voice to be recognized is determined according to the obtained relevant value. In the intention recognition process, the actual intention in all fields is predicted and judged, and then the final actual intention is determined by classification processing, so that partial intention is prevented from being ignored, the accuracy of intention recognition is improved, the intention recognition in different fields can be increased according to requirements, and the expandability of the intention recognition is also improved.
Drawings
FIG. 1 is a schematic flow chart of an intent recognition method according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating steps for obtaining feature vectors according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an intent recognition model according to an embodiment of the present application;
FIG. 4 is a flow chart illustrating steps for determining correlation values according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an apparatus for recognizing intent according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
Referring to fig. 1, fig. 1 is a flowchart of an intent recognition method according to an embodiment of the present application, and as shown in fig. 1, the intent recognition method according to an embodiment of the present application includes steps S101 to S104.
And step S101, extracting features of the voice to be recognized to obtain feature vectors.
When the intention recognition is carried out on the received voice information, firstly, the input voice information needing to be subjected to the intention recognition is received, and when the voice information is received, the voice information is correspondingly processed, so that a result which can be used for carrying out the intention recognition is obtained. Specifically, the voice to be recognized is obtained, and feature extraction is carried out on the voice to be recognized, so that feature vectors corresponding to the voice to be recognized are obtained.
When extracting the characteristics of the voice to be recognized, firstly converting the voice to be recognized into corresponding text information, and then processing the obtained text information to realize the extraction of the characteristic vector. Specifically, referring to fig. 2, fig. 2 is a flow chart illustrating steps of obtaining feature vectors according to an embodiment of the present application, wherein the steps include steps S201 to S203.
Step S201, converting the voice to be recognized into corresponding text information;
step S202, carrying out correction text processing on the text information;
and step 203, performing word segmentation processing on the corrected text information, and obtaining a feature vector according to a word segmentation processing result.
In an embodiment, when extracting features to obtain feature vectors, text conversion is performed on received voices to be recognized to convert voice information into corresponding text information, and then feature vector determination is performed according to the text information. Specifically, firstly, text conversion is carried out on voice to be recognized to obtain text information, then corresponding processing is carried out on the text information, error correction processing is carried out on the text information, and finally corresponding feature vectors are obtained according to the text information subjected to error correction.
When the feature vector is obtained according to the voice to be recognized, the voice to be recognized is firstly converted into corresponding text information, wherein a specific conversion mode and a means according to the voice to be recognized are not limited and can be realized based on a natural language processing technology, after the text information corresponding to the text to be recognized is obtained, word segmentation processing is carried out on the text information, and then feature extraction is carried out by utilizing keywords.
In practical applications, when the received voice to be recognized is converted into the corresponding text information, there may be error information in the text information, such as reverse text order and text conversion error, so after the text information is obtained, correction processing is performed on the text to correct and correct the error in the text information, and then feature extraction is performed after the correction processing is performed on the text information.
When word segmentation is carried out on text information, word segmentation is carried out on the text information according to different fields, the number of the fields is determined in advance, specifically, when an intention recognition system is built, intention recognition models aiming at the different fields are built in advance, then optimization processing is carried out on the intention recognition models aiming at the different fields through optimization training, and then intention recognition is carried out on the different fields in the use process. For the constructed intent recognition model, fig. 3 may be shown, and fig. 3 is a schematic structural diagram of the intent recognition model according to an embodiment of the present application.
In the intent recognition model, the intent recognition module is composed of several NLU (natural language understanding ) models, each of which corresponds to one field, respectively, and each of which is processed in parallel. When the voice to be recognized is received, corresponding text information is obtained through correction and other processing of the preprocessing module, and then the text information is input into each NLU model, so that intention recognition of the voice to be recognized in each field is realized.
That is, when the converted text information is processed to obtain the feature vector, the corrected text information is input into each NLU model, so that feature extraction of the text information in different fields is realized, and further, after feature extraction is completed, judgment and prediction of intention recognition are realized in each field according to the obtained features.
For example, when the feature vector is obtained, a corresponding feature vector in each field may be obtained, thereby realizing the intention recognition in different fields. The method specifically comprises the following steps: performing word segmentation processing on the corrected text information according to the intention domain to obtain a word segmentation set corresponding to the intention domain; and extracting keywords from the word segmentation set, and determining the feature vector of the voice to be recognized in the corresponding intention domain according to the keywords.
In fact, in the constructed intent recognition model, the intent recognition module is composed of a plurality of NLU models for different fields, and because the composition modes of different words or phrases in different fields are different, and the meanings are also different, when the word segmentation processing is performed to extract the feature vectors, different processing is required according to different fields. Specifically, one intention field corresponds to one field scene, that is, one intention field represents one usage scene, for example, a movie field is one intention field, a sports field is one intention field, a game field is one intention field, when feature vectors are acquired, word segmentation processing is performed on corrected text information according to the intention field, and then feature vectors corresponding to each intention field are determined according to word segmentation sets corresponding to each intention field.
For example, in the intention domain corresponding to the movie field, the "false kill" is a movie, in the intention domain corresponding to the legal field, the "false kill" is a case, and in the process of converting the voice into the text, the pinyin of the "false kill" is "wusha", which can be converted into "five kills", and the "five kills" belongs to a specific noun in the intention domain corresponding to the game field.
Therefore, after the voice to be recognized is converted into the corresponding text information, the text information is corrected first, and at this time, the text can be adjusted according to the meaning of the whole sentence of the sentence, for example, when the voice to be recognized is game-related information, "wusha" is converted into "five-killing" instead of "false-killing", and at the same time, the appearing partial word is corrected, for example, "king pesticide" is corrected into "king glory".
Step S102, determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and a probability value corresponding to each intention.
After extracting the characteristics of the voice to be recognized to obtain the corresponding characteristic vector, the judgment and the prediction of the intention recognition are carried out according to the obtained characteristic vector. Specifically, since each intention field corresponds to a feature vector, when the prediction and judgment of the intention are performed according to the feature vector, each feature vector will obtain an output result, that is, the intention output in the corresponding intention field, and after the initial recognition of the intention in all the intention fields is completed, an intention set is obtained, and the intention set includes a plurality of intentions and a probability value corresponding to each intention in the plurality of intentions.
As can be seen from fig. 3, in the process of intent recognition, the received speech to be recognized is input into each NLU model to perform initial recognition and determination of intent, and when each NLU model is processed, prediction results corresponding to the speech to be recognized in all different fields are determined, and when one NLU model performs prediction output, for example, all possible intent are obtained first, and a probability value corresponding to each intent, wherein the sum of the probability values of all the intentions in one NLU model is 1, and then the intent result of the speech to be recognized in the field is determined according to the size of the probability value, including the intent and the probability value.
For each NLU model, when each NLU model is processed, the corresponding intention and probability value in each field are obtained, and when the actual output of one NLU model is determined, the actual output is determined by the probability value corresponding to the intention, specifically, the intention with the largest probability value can be selected as the initial prediction result of the NLU model.
Therefore, when determining the intention corresponding to the voice to be recognized according to the feature vector, an intention set is obtained, specifically, an intention obtained by performing recognition processing in each intention domain is included. In practice, in determining the intent set, it includes: and inputting each feature vector into a corresponding intermediate prediction model, outputting to obtain the maximum probability intention corresponding to each feature vector, and obtaining an intention set corresponding to the voice to be recognized based on the maximum probability intention corresponding to each feature vector.
Step S103, inputting the intention set into a classification model to obtain a correlation value of each intention.
The classification model includes but is not limited to a Logistic model, an SVM model and the like.
When the intention set corresponding to the voice to be recognized is obtained, the received intention of the voice to be recognized is determined to be one intention in the intention set. While in determining the actual intent, it is necessary to reasonably determine based on actual conditions, such as user preferences and context. Thus, when the intention set is obtained, the intention set is input into the classification model, and accordingly a correlation value corresponding to each intention in the intention set is determined, wherein the correlation value is used for determining a probability value that the intention is a true intention, the larger the correlation value is, the more relevant the intention is, and conversely, the weaker the correlation is not the recognized intention.
Referring to fig. 4, fig. 4 is a flow chart illustrating a step of determining a correlation value according to an embodiment of the present application. Wherein the step includes steps S401 to S402.
Step S401, obtaining preference information according to historical intention data;
step S402, inputting the intention set and the preference information into the classification model to obtain a correlation value of each intention and the preference information.
In determining the correlation value, factor information for determining the correlation value, such as user preference and resource heat, is first acquired, and then the obtained factor information is input into a classification model together with the intention set to calculate the magnitude of correlation between each intention in the intention set and the factor information.
Therefore, when the intention set is obtained, historical intention data is also obtained, then the corresponding preference information at the moment is determined according to the obtained historical intention data, the obtained preference information and the intention set are input into a classification model together, and a correlation value between each intention and the preference information is obtained through model prediction.
Factor information, namely preference information of the user, is obtained according to the information in a certain period of time, and meanwhile, updating processing is carried out in real time according to the actual response situation. Specifically, the search and view records can be determined according to the search and view records within 24 hours, such as obtaining the search and view records of the user within 24 hours, processing the search and view results of the user, such as screening, summarizing and analyzing, and determining final factor information, such as determining that the user prefers to view sports information within 24 hours.
And when the preference information is determined, historical intention data in a certain period of time is obtained, wherein the historical intention data is the relation and response between the voice information to be recognized and the actual intention result which are input by the user. By analyzing the historical intent data, the actual preference information, such as the domain information preferred by the user, is determined, and further the user intent is determined according to the preference of the user.
In one embodiment, in addition to determining the user's intent based on the user's actual preferences, the intent determination may be accomplished based on resource popularity, while the intent determination may also be accomplished based on a combination of user preferences and resource popularity. For example, in the process of interaction between the user and the device or the terminal, the user wants to watch the movie, and the information displayed on the terminal or the device may include related movies displayed according to the preference of the user, and may also include one or some movies with higher current heat.
In practical applications, the preference of the user has a great influence on the overall intention determination, for example, the user is listening to songs, the intention recognition is biased towards the music direction, and for example, the user is watching a movie, and the intention of the user is biased towards the movie direction. Thus, after obtaining the set of intentions corresponding to the speech to be recognized, preference information associated with the user, such as a history, may also be obtained when determining the actual intent of the user, and determined by calculating the similarity between the intent and the preference information when determining the actual correlation value.
For example, the user's preference and preference is in motion, and then when determining the actual intent in the resulting set of intents, it is apparent that the intent at that time should be biased toward motion.
For another example, if the previous dialog of the user is a movie query, then when determining the actual intent in the resulting intent set, it is apparent that the intent should be biased toward the movie, such as the movie name or actor name.
For the classification model currently used, the classification model is obtained through training and optimization in advance and is further embedded in the constructed intention recognition model shown in fig. 3. In the actual training process, the training and optimization of the model are realized by inputting the factor information and the intention corresponding to the related values of the groups into the model.
Step S104, determining the intention corresponding to the voice to be recognized according to the obtained correlation value.
After determining the correlation value corresponding to each intention in the set of intents, the intents currently available for response are determined according to the correlation value corresponding to each intention. Specifically, when the correlation value corresponding to each intention is obtained, the intention corresponding to the voice to be recognized is determined according to the magnitude of the correlation value, and specifically, one or a plurality of intentions can be selected as the intention corresponding to the voice to be recognized through the magnitude.
In an embodiment, when determining the intention corresponding to the voice to be recognized, a preset correlation threshold is provided, and then the obtained correlation value is compared with the correlation threshold, so as to realize screening of the intention set, obtain a plurality of intentions with correlation values larger than the correlation threshold, and the number of the intentions obtained by comparison can be one or a plurality.
In addition, after obtaining the intention corresponding to the voice to be recognized, the method responds to the determined intention, such as inquiring and displaying information, and then plays a certain music or movie, etc. Specifically, after the intention is obtained, a control instruction corresponding to the current is generated through recognition of the intention, and then the equipment or the terminal responding to the intention is controlled to respond to the obtained control instruction.
For example, when a user interacts with the smart television, the user controls the smart television to query or play the program in a voice interaction mode. At the moment, the user outputs voice information through speaking, then the intelligent television receives the voice information output by the user, carries out intention recognition on the voice information, determines the real intention of the user, and finally responds according to the determined real intention.
In summary, in the intention recognition method provided by the embodiment of the application, when the intention recognition is performed, firstly, text conversion is performed on the received voice information, error correction and correction processing are performed, then, feature vectors under different intention domains are obtained through feature extraction, then, intention recognition is performed in different intention domains, so that the intention corresponding to each intention domain is obtained, further, processing is performed in a classification model according to each intention, so as to obtain a correlation value corresponding to each intention, and finally, the intention corresponding to the voice to be recognized is determined according to the obtained correlation value. In the intention recognition process, the actual intention in all fields is predicted and judged, and then the final actual intention is determined by classification processing, so that partial intention is prevented from being ignored, the accuracy of intention recognition is improved, the intention recognition in different fields can be increased according to requirements, and the expandability of the intention recognition is also improved.
According to the method described in the above embodiments, the present embodiment will be further described from the perspective of an intention recognition device, which may be implemented as a separate entity or may be implemented as an integrated electronic device, such as a terminal, which may include a mobile phone, a tablet computer, or the like.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an intention recognition device according to an embodiment of the present application, and as shown in fig. 5, an intention recognition device 500 according to an embodiment of the present application includes:
the feature extraction module 501 is configured to perform feature extraction on a voice to be recognized to obtain a feature vector;
an intention prediction module 502, configured to determine, according to the feature vector, an intention set corresponding to the speech to be recognized, where the intention set includes a plurality of intents and a probability value corresponding to each intention;
a correlation determination module 503, configured to input the intent set into a classification model, to obtain a correlation value of each intent;
and the intention determining module 504 is configured to determine an intention corresponding to the voice to be recognized according to the obtained correlation value.
In the implementation, each module and/or unit may be implemented as an independent entity, or may be combined arbitrarily and implemented as the same entity or a plurality of entities, where the implementation of each module and/or unit may refer to the foregoing method embodiment, and the specific beneficial effects that may be achieved may refer to the beneficial effects in the foregoing method embodiment, which are not described herein again.
In addition, referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device may be a mobile terminal, such as a smart phone, a tablet computer, or the like. As shown in fig. 6, the electronic device 600 includes a processor 601, a memory 602. The processor 601 is electrically connected to the memory 602.
The processor 601 is a control center of the electronic device 600, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device 600 and processes data by running or loading application programs stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the electronic device 600.
In this embodiment, the processor 601 in the electronic device 600 loads instructions corresponding to the processes of one or more application programs into the memory 602 according to the following steps, and the processor 601 executes the application programs stored in the memory 602, so as to implement various functions:
extracting features of the voice to be recognized to obtain feature vectors;
determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and probability values corresponding to each intention;
inputting the intention set into a classification model to obtain a relevant value of each intention;
and determining the intention corresponding to the voice to be recognized according to the obtained correlation value.
The electronic device 600 may implement the steps in any embodiment of the method for identifying intent provided in the embodiment of the present application, so that the beneficial effects that any of the method for identifying intent provided in the embodiment of the present application can be implemented, which are detailed in the previous embodiments and are not described herein.
Referring to fig. 7, fig. 7 is another schematic structural diagram of an electronic device according to an embodiment of the present application, and fig. 7 is a specific structural block diagram of the electronic device according to the embodiment of the present application, where the electronic device may be used to implement the intent recognition method provided in the above embodiment. The electronic device 700 may be a mobile terminal such as a smart phone or a notebook computer.
The RF circuit 710 is configured to receive and transmit electromagnetic waves, and to perform mutual conversion between the electromagnetic waves and the electrical signals, thereby communicating with a communication network or other devices. RF circuitry 710 may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and so forth. The RF circuitry 710 may communicate with various networks such as the internet, intranets, wireless networks, or other devices via wireless networks. The wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. The wireless network may use various communication standards, protocols, and technologies including, but not limited to, global system for mobile communications (Global System for Mobile Communication, GSM), enhanced mobile communications technology (Enhanced Data GSM Environment, EDGE), wideband code division multiple access technology (Wideband Code Division Multiple Access, WCDMA), code division multiple access technology (Code Division Access, CDMA), time division multiple access technology (Time Division Multiple Access, TDMA), wireless fidelity technology (Wireless Fidelity, wi-Fi) (e.g., institute of electrical and electronics engineers standards IEEE 802.11a,IEEE 802.11b,IEEE802.11g and/or IEEE802.11 n), internet telephony (Voice over Internet Protocol, voIP), worldwide interoperability for microwave access (Worldwide Interoperability for Microwave Access, wi-Max), other protocols for mail, instant messaging, and short messaging, as well as any other suitable communication protocols, even including those not currently developed.
The memory 720 may be used to store software programs and modules, such as program instructions/modules corresponding to the intent recognition method in the above embodiments, and the processor 780 executes the software programs and modules stored in the memory 720 to perform various functional applications and intent recognition, i.e., to implement the following functions:
extracting features of the voice to be recognized to obtain feature vectors;
determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and probability values corresponding to each intention;
inputting the intention set into a classification model to obtain a relevant value of each intention;
and determining the intention corresponding to the voice to be recognized according to the obtained correlation value.
Memory 720 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 720 may further include memory located remotely from processor 780, which may be connected to electronic device 700 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input unit 730 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 730 may include a touch-sensitive surface 731 and other input devices 732. The touch-sensitive surface 731, also referred to as a touch display screen or touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on or thereabout the touch-sensitive surface 731 using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection device according to a pre-set program. Alternatively, touch-sensitive surface 731 may comprise two parts, a touch-detecting device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 780, and can receive commands from the processor 780 and execute them. In addition, the touch sensitive surface 731 may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 731, the input unit 730 may also include other input devices 732. In particular, the other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
The display unit 740 may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the electronic device 700, which may be composed of graphics, text, icons, video, and any combination thereof. The display unit 740 may include a display panel 741, and alternatively, the display panel 741 may be configured in the form of an LCD (Liquid Crystal Display ), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 731 may overlay the display panel 741, and upon detection of a touch operation thereon or thereabout by the touch-sensitive surface 731, the touch-sensitive surface 731 is passed to the processor 780 for determining the type of touch event, and the processor 780 then provides a corresponding visual output on the display panel 741 based on the type of touch event. Although in the figures the touch-sensitive surface 731 and the display panel 741 are implemented as two separate components, in some embodiments the touch-sensitive surface 731 and the display panel 741 may be integrated to implement the input and output functions.
The electronic device 700 may also include at least one sensor 750, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 741 according to the brightness of ambient light, and a proximity sensor that may generate an interrupt when the folder is closed or closed. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the electronic device 700 are not described in detail herein.
Audio circuitry 760, speaker 761, and microphone 762 may provide an audio interface between a user and electronic device 700. The audio circuit 760 may transmit the received electrical signal converted from audio data to the speaker 761, and the electrical signal is converted into a sound signal by the speaker 761 to be output; on the other hand, microphone 762 converts the collected sound signals into electrical signals, which are received by audio circuit 760 and converted into audio data, which are processed by audio data output processor 780 for transmission to, for example, another terminal via RF circuit 710, or which are output to memory 720 for further processing. Audio circuitry 760 may also include an ear bud jack to provide communication between a peripheral ear bud and electronic device 700.
The electronic device 700 may facilitate user reception of requests, transmission of information, etc. via a transmission module 770 (e.g., wi-Fi module), which provides wireless broadband internet access to the user. Although the transmission module 770 is shown in the drawings, it is understood that it does not belong to the essential constitution of the electronic device 700, and can be omitted entirely as required within the scope not changing the essence of the application.
The processor 780 is a control center of the electronic device 700, connects various parts of the entire handset using various interfaces and lines, and performs various functions of the electronic device 700 and processes data by running or executing software programs and/or modules stored in the memory 720 and invoking data stored in the memory 720, thereby performing overall monitoring of the electronic device. Optionally, the processor 780 may include one or more processing cores; in some embodiments, the processor 780 may integrate an application processor that primarily processes operating systems, user interfaces, applications, and the like, with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 780.
The electronic device 700 also includes a power supply 790 (e.g., a battery) that provides power to the various components, and in some embodiments, may be logically coupled to the processor 780 through a power management system to perform functions such as managing charging, discharging, and power consumption by the power management system. Power supply 790 may also include one or more of any components, such as a dc or ac power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown, the electronic device 700 further includes a camera (e.g., front camera, rear camera), a bluetooth module, etc., which will not be described in detail herein. In particular, in this embodiment, the display unit of the electronic device is a touch screen display, the mobile terminal further includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
extracting features of the voice to be recognized to obtain feature vectors;
determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and probability values corresponding to each intention;
inputting the intention set into a classification model to obtain a relevant value of each intention;
and determining the intention corresponding to the voice to be recognized according to the obtained correlation value.
In the implementation, each module may be implemented as an independent entity, or may be combined arbitrarily, and implemented as the same entity or several entities, and the implementation of each module may be referred to the foregoing method embodiment, which is not described herein again.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor. To this end, an embodiment of the present application provides a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the embodiments of the intent recognition method provided by the embodiment of the present application.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The instructions stored in the storage medium may perform the steps in any embodiment of the method for identifying intent provided in the embodiments of the present application, so that the beneficial effects that any method for identifying intent provided in the embodiments of the present application can achieve are detailed in the previous embodiments, and are not described herein.
The above description of the embodiment of the present application provides a method, apparatus, electronic device and storage medium for identifying intent, and specific examples are applied to illustrate the principles and implementation of the present application, where the above description of the embodiment is only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application. Moreover, it will be apparent to those skilled in the art that various modifications and variations can be made without departing from the principles of the present application, and such modifications and variations are also considered to be within the scope of the application.
Claims (10)
1. An intent recognition method, comprising:
extracting features of the voice to be recognized to obtain feature vectors;
determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and probability values corresponding to each intention;
inputting the intention set into a classification model to obtain a relevant value of each intention;
and determining the intention corresponding to the voice to be recognized according to the obtained correlation value.
2. The method for recognizing intention as claimed in claim 1, wherein the steps of obtaining a voice to be recognized, and extracting features of the voice to be recognized to obtain feature vectors, include:
converting the voice to be recognized into corresponding text information;
carrying out correction text processing on the text information;
and performing word segmentation on the corrected text information, and obtaining a feature vector according to a word segmentation result.
3. The intention recognition method according to claim 1, wherein the word segmentation processing is performed on the corrected text information, and the feature vector is obtained based on the result of the word segmentation processing, comprising:
performing word segmentation processing on the corrected text information according to the intention domain to obtain a word segmentation set corresponding to the intention domain;
extracting keywords from the word segmentation set, and determining feature vectors in the corresponding intention domains according to the keywords, wherein one feature vector corresponds to one intention domain.
4. The intent recognition method as recited in claim 3, wherein determining the set of intents corresponding to the speech to be recognized based on the feature vector includes:
and inputting each characteristic vector in the characteristic vectors into a corresponding intermediate prediction model, outputting to obtain the maximum probability intention corresponding to each characteristic vector, and obtaining the intention set corresponding to the voice to be recognized based on the maximum probability intention corresponding to each characteristic vector.
5. The method for identifying an intention as recited in claim 4, wherein the inputting the set of intents into a classification model to obtain a correlation value for each intention in the set of intents comprises:
obtaining preference information according to the historical intent data;
and inputting the intention set and the preference information into the classification model to obtain a correlation value of each intention and the preference information.
6. The method for recognizing intention according to claim 5, wherein the determining the intention corresponding to the voice to be recognized based on the correlation value comprises:
comparing the correlation value with a correlation threshold;
and screening the intention set according to the obtained comparison result to obtain the intention corresponding to the voice information to be recognized.
7. The method for recognizing intention according to any one of claims 1 to 6, wherein after determining the intention corresponding to the voice to be recognized according to the correlation value, further comprising:
and generating a response instruction according to the intention, and controlling the equipment to be responded to respond to the response instruction.
8. An intent recognition device, comprising:
the feature extraction module is used for extracting features of the voice to be recognized to obtain feature vectors;
the intention prediction module is used for determining an intention set corresponding to the voice to be recognized according to the feature vector, wherein the intention set comprises a plurality of intentions and a probability value corresponding to each intention;
the association determining module is used for inputting the intention set into the classification model to obtain a relevant value of each intention;
and the intention determining module is used for determining the intention corresponding to the voice to be recognized according to the obtained related value.
9. An electronic device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the memory being coupled to the processor and the processor implementing the steps in the intent recognition method as claimed in any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, wherein the computer program, when run, controls a device on which the computer readable storage medium resides to perform the steps in the intention recognition method as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210118122.5A CN116612751A (en) | 2022-02-08 | 2022-02-08 | Intention recognition method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210118122.5A CN116612751A (en) | 2022-02-08 | 2022-02-08 | Intention recognition method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116612751A true CN116612751A (en) | 2023-08-18 |
Family
ID=87673398
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210118122.5A Pending CN116612751A (en) | 2022-02-08 | 2022-02-08 | Intention recognition method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116612751A (en) |
-
2022
- 2022-02-08 CN CN202210118122.5A patent/CN116612751A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106528545B (en) | Voice information processing method and device | |
CN111050370A (en) | Network switching method and device, storage medium and electronic equipment | |
CN112820299B (en) | Voiceprint recognition model training method and device and related equipment | |
CN111177180A (en) | Data query method and device and electronic equipment | |
CN112230877A (en) | Voice operation method and device, storage medium and electronic equipment | |
WO2022227507A1 (en) | Wake-up degree recognition model training method and speech wake-up degree acquisition method | |
CN117332067A (en) | Question-answer interaction method and device, electronic equipment and storage medium | |
CN111897916B (en) | Voice instruction recognition method, device, terminal equipment and storage medium | |
CN117726003A (en) | Response defense method, device, equipment and storage medium based on large model reasoning | |
CN117576258A (en) | Image processing method, device, electronic equipment and storage medium | |
CN111027406B (en) | Picture identification method and device, storage medium and electronic equipment | |
CN111062200B (en) | Speaking generalization method, speaking recognition device and electronic equipment | |
WO2023246558A1 (en) | Semantic understanding method and apparatus, and medium and device | |
CN111178055B (en) | Corpus identification method, apparatus, terminal device and medium | |
CN110390102B (en) | Emotion analysis method and related device | |
CN112433694B (en) | Light intensity adjusting method and device, storage medium and mobile terminal | |
CN111723783B (en) | Content identification method and related device | |
CN116612751A (en) | Intention recognition method, device, electronic equipment and storage medium | |
CN112367428A (en) | Electric quantity display method and system, storage medium and mobile terminal | |
CN107645604B (en) | Call processing method and mobile terminal | |
CN117012202B (en) | Voice channel recognition method and device, storage medium and electronic equipment | |
CN113806532B (en) | Training method, device, medium and equipment for metaphor sentence judgment model | |
CN111221782B (en) | File searching method and device, storage medium and mobile terminal | |
CN110047076B (en) | Image information processing method and device and storage medium | |
CN117672217A (en) | Training method, device, equipment and storage medium of voice wake-up model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |