CN115862628A - Intention recognition method, device, equipment and storage medium - Google Patents

Intention recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN115862628A
CN115862628A CN202210368888.9A CN202210368888A CN115862628A CN 115862628 A CN115862628 A CN 115862628A CN 202210368888 A CN202210368888 A CN 202210368888A CN 115862628 A CN115862628 A CN 115862628A
Authority
CN
China
Prior art keywords
intention
recognition
audio
audio file
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210368888.9A
Other languages
Chinese (zh)
Inventor
张鹏飞
曲玉妹
沈耀飞
张磊
夏溧
井绪海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongguancun Kejin Technology Co Ltd
Original Assignee
Beijing Zhongguancun Kejin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongguancun Kejin Technology Co Ltd filed Critical Beijing Zhongguancun Kejin Technology Co Ltd
Priority to CN202210368888.9A priority Critical patent/CN115862628A/en
Publication of CN115862628A publication Critical patent/CN115862628A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses an intention identification method, an intention identification device, intention identification equipment and a storage medium. The intention identification method includes: acquiring an audio file to be identified; carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result; and executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the audio recognition result to obtain a first intention recognition result. The method and the device can improve the accuracy of the intention recognition result and save labor and time cost.

Description

Intention recognition method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an intention recognition method, apparatus, device, and storage medium.
Background
Along with the development of AI technology, the ability of intelligent products on the perception aspect is becoming stronger, and it can perceive people's pronunciation, body language, gesture action, expression eye-catching etc. has realized the possibility of man-machine natural interaction. The trend of future intelligent products is to have emotion Computing capability (emotion Computing), adjust self feedback to meet the demands of people at that moment by recognizing human voice information, facial expressions, limb actions and the like, interaction becomes easier and easier, and the intelligent products can understand more.
The application scenes of the intelligent interaction technology are various, and the current common scenes comprise intelligent home, intelligent sound, self-service business handling and the like. In each application scene, the intention of the client needs to be recognized first, and further interaction is performed according to the intention recognition result. In order to perform the intention recognition, it is common to perform the recognition by using an NLP (Natural Language Processing) intention recognition algorithm. However, the existing NLP intent recognition algorithm is basically obtained by training corpus files of specific industries and specific scenes, and has no universality, and if the existing NLP algorithm is directly used for intent recognition, the accuracy of an intent recognition result is poor. However, if a suitable NLP intention recognition algorithm is obtained by training again based on the training samples constructed manually, the labor and time costs are high.
Disclosure of Invention
The invention mainly aims to provide an intention identification method, an intention identification device, intention identification equipment and a storage medium, and aims to save labor and time cost while improving the accuracy of an intention identification result.
To achieve the above object, the present invention provides an intention identifying method including:
acquiring an audio file to be identified;
carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result;
and executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the audio recognition result to obtain a first intention recognition result.
Preferably, before the step of performing speech recognition on the audio file to be recognized to obtain an audio recognition result, the intention recognition method further includes:
performing VAD detection on the audio file to be identified to obtain a detection result;
judging whether the audio file to be identified is blank audio according to the detection result;
if the audio file to be identified is not blank audio, executing the following steps: and carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result.
Preferably, before the step of performing speech recognition on the audio file to be recognized to obtain an audio recognition result, the intention recognition method further includes:
judging whether a silent segment exists at the beginning and/or the end of the audio file to be identified according to the detection result;
if the beginning and/or the end of the audio file to be identified have the silent segments, determining to obtain a silent period according to the detection result;
intercepting the audio file to be identified according to the silent time period to obtain a target audio file;
the step of performing voice recognition on the audio file to be recognized to obtain an audio recognition result comprises the following steps:
and performing voice recognition on the target recognition audio file to obtain an audio recognition result.
Preferably, before the step of executing a preset intent recognition filter chain to invoke an intent matcher to perform intent matching on the audio recognition result to obtain a first intent recognition result, the intent recognition method further includes:
acquiring answer intention information, wherein the answer intention information comprises expected answers and intention types;
constructing the expected answer into a state tree of a finite state machine according to a finite state machine algorithm, and obtaining an intention matcher corresponding to each intention type based on the state tree;
and assembling the intention matcher according to a preset filter sequence and the intention type to obtain the preset intention recognition filter chain.
Preferably, the executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the audio recognition result to obtain a first intention recognition result includes:
and executing a preset intention recognition filter chain, sequentially calling corresponding intention matchers according to a preset filter sequence to carry out intention matching on the audio recognition result, stopping matching until an expected answer in a state tree of the called intention matchers is successfully matched with the audio recognition result, and outputting a first intention recognition result.
Preferably, the intention identifying method further includes:
detecting whether the first intention recognition result is empty;
if the first intention identification result is empty, converting the audio identification result from the Chinese characters into pinyin to obtain a first pinyin text;
obtaining similar pinyins of all pinyins in the first pinyin text, and constructing to obtain a second pinyin text according to the first pinyin text and the similar pinyins;
and executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the second Pinyin text to obtain a second intention recognition result.
Further, to achieve the above object, the present invention also provides an intention identifying apparatus comprising:
the file acquisition module is used for acquiring an audio file to be identified;
the voice recognition module is used for carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result;
and the intention recognition module is used for executing a preset intention recognition filter chain so as to call an intention matcher to perform intention matching on the audio recognition result to obtain a first intention recognition result.
Further, to achieve the above object, the present invention also provides an intention identifying apparatus comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the intent recognition method as described above.
Furthermore, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the intention identifying method as described above.
Furthermore, to achieve the above object, the present invention also provides a computer program product comprising a computer program which, when being executed by a processor, realizes the steps of the intention identifying method as described above.
The invention provides an intention recognition method, device, equipment, storage medium and product. The invention constructs an intention recognition filter chain through a responsibility chain mode and a finite state machine combination mode to finish the recognition of the intention, compared with the recognition by adopting an NLP intention recognition algorithm, the accuracy of the intention recognition result can be greatly improved, meanwhile, the invention does not need to manually construct a training sample, and does not need to train to obtain a proper NLP intention recognition algorithm, thereby greatly saving labor cost and time cost.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of an intent recognition method according to the present invention;
fig. 3 is a functional block diagram of a first embodiment of the intent recognition apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Referring to fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The intention recognition device in the embodiment of the present invention may be a server, or may be a terminal device such as a PC (Personal Computer), a tablet Computer, or a portable Computer.
As shown in fig. 1, the intention recognition device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
It will be appreciated by those skilled in the art that the configuration of the intent recognition device illustrated in FIG. 1 is not intended to be limiting of the intent recognition device and may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, the memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a computer program.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client and performing data communication with the client; and the processor 1001 may be configured to invoke the computer program stored in the memory 1005 and perform the following operations:
acquiring an audio file to be identified;
carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result;
and executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the audio recognition result to obtain a first intention recognition result.
Further, the processor 1001 may call the computer program stored in the memory 1005, and also perform the following operations:
performing VAD detection on the audio file to be identified to obtain a detection result;
judging whether the audio file to be identified is blank audio according to the detection result;
and if the audio file to be recognized is not blank audio, performing voice recognition on the audio file to be recognized to obtain an audio recognition result.
Further, the processor 1001 may call the computer program stored in the memory 1005, and also perform the following operations:
judging whether a silent segment exists at the beginning and/or the end of the audio file to be identified according to the detection result;
if the beginning and/or the end of the audio file to be identified have the silent segments, determining to obtain a silent period according to the detection result;
intercepting the audio file to be identified according to the silent time period to obtain a target audio file;
and carrying out voice recognition on the target recognition audio file to obtain an audio recognition result.
Further, the processor 1001 may call the computer program stored in the memory 1005, and also perform the following operations:
acquiring answer intention information, wherein the answer intention information comprises expected answers and intention types;
constructing the expected answer into a state tree of a finite state machine according to a finite state machine algorithm, and obtaining an intention matcher corresponding to each intention type based on the state tree;
and assembling the intention matcher according to a preset filter sequence and the intention type to obtain the preset intention recognition filter chain.
Further, the processor 1001 may call the computer program stored in the memory 1005, and also perform the following operations:
and executing a preset intention recognition filter chain, sequentially calling corresponding intention matchers according to a preset filter sequence to carry out intention matching on the audio recognition result, stopping matching until an expected answer in a state tree of the called intention matchers is successfully matched with the audio recognition result, and outputting a first intention recognition result.
Further, the processor 1001 may call the computer program stored in the memory 1005, and also perform the following operations:
detecting whether the first intention recognition result is empty;
if the first intention identification result is empty, converting the audio identification result from the Chinese characters into pinyin to obtain a first pinyin text;
obtaining similar pinyin of each pinyin in the first pinyin text, and constructing a second pinyin text according to the first pinyin text and the similar pinyin;
and executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the second Pinyin text to obtain a second intention recognition result.
Based on the above hardware structure, embodiments of the intent recognition method of the present invention are presented.
The invention provides an intention identification method.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the method for intent recognition according to the present invention.
In this embodiment, the intention identifying method includes:
step S10, acquiring an audio file to be identified;
the intention identifying method of the present embodiment is implemented by an intention identifying device, which may be a server, or a terminal device such as a PC (Personal Computer), a tablet Computer, or a portable Computer.
In this embodiment, first, an audio file to be identified is obtained, where the audio file to be identified is a recorded audio file, and the content of the audio file is a response made by a client to a question.
Step S20, carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result;
after the audio file to be recognized is obtained, voice recognition is carried out on the audio file to be recognized, and an audio recognition result is obtained. During Speech Recognition, an ASR (Automatic Speech Recognition) algorithm may be used for Recognition, and a corresponding interface may be called for Speech Recognition, and of course, during specific implementation, the audio file to be recognized may also be sent to a corresponding professional platform for Speech Recognition, and then an audio Recognition result returned by the professional platform is received. The audio recognition result is the text corresponding to the response made by the client to the question.
And step S30, executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the audio recognition result to obtain a first intention recognition result.
After the voice recognition is performed to obtain the audio recognition result, a preset intention recognition filter chain is executed, where the preset intention recognition filter chain is assembled in advance, and the specific assembling process may refer to the following fourth embodiment, which is not described herein again.
The preset intention recognition filter chain comprises filters and intention matchers corresponding to various intention types, when the preset intention recognition filter chain is executed, the preset intention recognition filter chain can sequentially enter the filters corresponding to the intention types according to the sequence of the preset filters, after the preset intention recognition filter chain enters the corresponding filters, the corresponding intention matchers can be called, the expected answers in the state tree of the called intention matchers and the audio recognition results are subjected to intention matching, the matching is stopped until the expected answers in the state tree of the intention matchers of the called intention matchers and the audio recognition results are successfully matched, and the intention recognition results are output (the intention recognition results are distinguished from the subsequent intention recognition results obtained based on the second pinyin text, and the intention recognition results are recorded as first intention recognition results), namely the intention types corresponding to the intention matchers which are successful.
For example, if the preset intent recognition filter chain includes n filters, the corresponding intent types are in turn: the filter corresponding to the negative intention is firstly input when the intention is identified, and the intention matcher dynamically acquiring the corresponding negative intention is used for matching when the filter corresponding to the negative intention works; if the intention matcher which does not reject the intention is not matched successfully, the filter corresponding to the circulation intention is entered, and when the filter corresponding to the circulation intention works, the intention matcher which corresponds to the circulation intention is dynamically obtained for matching; if the matching of the intention matcher of the circulation intention is successful, outputting a first intention identification result as the circulation intention; if the matching of the intent matcher fails, the next filter is entered (i.e., the positive filter), and so on until the matching is successful. And if all the matching fails, inputting that the first intention recognition result is null.
The embodiment of the invention provides an intention identification method, which comprises the steps of firstly obtaining an audio file to be identified, then carrying out voice identification on the audio file to be identified to obtain an audio identification result, further executing a preset intention identification filter chain, calling an intention matcher to carry out intention matching on the audio identification result, and obtaining a first intention identification result. The preset intention recognition filter chain is constructed through a responsibility chain mode, the intention matcher is constructed based on a finite state machine, an intention recognition filter chain is constructed through the responsibility chain mode and the finite state machine combination mode to be used for recognizing the intention, and compared with the recognition method adopting an NLP intention recognition algorithm, the accuracy of an intention recognition result can be greatly improved.
Further, based on the above-described first embodiment, a second embodiment of the intention identifying method of the present invention is proposed.
In this embodiment, before the step S20, the intention identifying method further includes:
step A, performing VAD detection on the audio file to be identified to obtain a detection result;
in this embodiment, after the audio file to be recognized is obtained, before the algorithm is called to perform Voice recognition, VAD (Voice Activity Detection) Detection is performed on the audio file to be recognized to obtain a Detection result, so as to recognize a silence segment from the audio signal stream.
B, judging whether the audio file to be identified is blank audio according to the detection result;
if the audio file to be identified is not a blank audio, executing step S20: and carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result.
After the VAD detection result is obtained, whether the audio file to be identified is blank audio or not is judged according to the detection result, namely whether the total duration of the silent segment of the audio file to be identified is greater than or equal to the product value of the total duration and a preset proportion (which can be specifically set according to actual needs) or not is detected according to the detection result, and if the total duration of the silent segment is greater than or equal to the product value, the audio file to be identified is judged to be blank audio; and if the total duration of the silent segments is less than the product value, judging that the audio file to be identified is not blank audio.
Further, if the audio file to be identified is not a blank audio, continuing to execute the following steps: and carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result. For the specific implementation process, reference may be made to the first embodiment, which is not described herein again.
Further, if the audio file to be identified is blank audio, the subsequent steps are stopped to be executed, and error prompt information is generated to prompt that the audio file to be identified is blank audio, so that detection is not needed, and algorithm resources are prevented from being wasted.
In this embodiment, by performing VAD detection on the audio file to be recognized, subsequent detection is not performed when blank audio is detected, and subsequent detection is performed only when it is detected that the audio file to be recognized is not blank audio.
Further, based on the above-described second embodiment, a third embodiment of the intention identifying method of the present invention is proposed.
In this embodiment, before the step S20, the intention identifying method further includes:
step C, judging whether a silent segment exists at the beginning and/or the end of the audio file to be identified according to the detection result;
in this embodiment, after obtaining the VAD detection result and detecting the blank audio, if the audio file to be identified is not the blank audio, further, whether a silence segment exists at the beginning and/or the end of the audio file to be identified may be determined according to the detection result.
Step D, if a silent segment exists at the beginning and/or the end of the audio file to be identified, determining to obtain a silent period according to the detection result;
step E, intercepting the audio file to be identified according to the silent time period to obtain a target audio file;
at this time, step S20 may include:
and carrying out voice recognition on the target recognition audio file to obtain an audio recognition result.
And if the beginning and/or the end of the audio file to be identified have the silence segments, determining to obtain the silence period according to the detection result, namely the period corresponding to the beginning silence segment and/or the end silence segment. And then, intercepting the audio file to be identified according to the silent time period, namely removing the silent segments at the beginning and/or the end and reserving the non-silent segments at the middle part to obtain the target audio file. Further, the target recognition audio file is subjected to speech recognition to obtain an audio recognition result, and further subsequent steps are performed, and the specific execution process may refer to the first embodiment, which is not described herein again.
By the method, whether the beginning and/or the end of the audio file to be recognized have the silent segments or not is detected, and if the silent segments exist, the silent segments are segmented out, so that the waste of part of algorithm resources is further avoided, the waste of expense for calling the voice recognition algorithm by an enterprise is further avoided, and the enterprise cost can be saved.
Further, based on the first to third embodiments described above, a fourth embodiment of the intention identification method of the present invention is proposed.
In this embodiment, before the step S30, the intention identifying method further includes:
step F, obtaining answer intention information, wherein the answer intention information comprises expected answers and intention types;
step G, constructing the expected answer into a state tree of a finite state machine according to a finite state machine algorithm, and obtaining an intention matcher corresponding to each intention type based on the state tree;
and H, assembling the intention matcher according to a preset filter sequence and the intention type to obtain the preset intention recognition filter chain.
In this embodiment, answer intention information is obtained, where the answer intention information includes an expected answer and an intention type, and an expression manner of the answer intention information may be in a form of a rule array. For example, the answer intention information may be:
Figure BDA0003587016700000101
Figure BDA0003587016700000111
it should be noted that the answer is the expected answer, the expected answer may include one or more answers, and when there are more answers, they may be separated by/(the reverse slope); the purposeType is an intention type, and different numbers can be set for different intention types, for example, 1 indicates positive, 2 indicates negative, 3 indicates cyclic, and 4 indicates silent.
After the answer intention information is obtained, the expected answer can be constructed into a state tree of the finite state machine according to a finite state machine algorithm, and the intention matcher corresponding to each filter is obtained based on the state tree.
Specifically, the expected answers may be cut into arrays according to a back-slope, then the elements in the arrays are constructed into an intention matching tree, the root node is root, and each expected answer is a leaf node on the state tree corresponding to the intention matcher.
Further, in order to reduce the influence of dialects or accents on the accuracy of the intention recognition result, when the intention matcher is constructed, all elements in the array corresponding to all expected answers can be converted into pinyin besides constructing the state tree of the intention matcher based on the expected answers, and then the pinyins of all the elements are constructed on the state tree of the intention matcher. By constructing the intention matcher in the above way, the accuracy of the subsequent intention identification result can be further improved.
After the intention matcher is constructed, the intention matcher is assembled according to a preset filter sequence and intention types according to a responsibility chain mode to obtain a preset intention recognition filter chain, and finally, the preset intention recognition filter chain comprises filters and intention matchers corresponding to various intention types.
It should be noted that the Chain of responsibility (Chain of responsibility) mode is to avoid coupling the request sender with multiple request handlers, so that all the request handlers are connected into a Chain by the previous object remembering the reference of its next object; when a request occurs, the request may be passed along the chain until an object processes it. The responsibility chain mode is an object behavior mode, in the responsibility chain mode, a client only needs to send a request to a responsibility chain, the processing details of the request and the transmission process of the request do not need to be concerned, and the request can be automatically transmitted. The chain of responsibility decouples the sender of the request and the handler of the request.
Further, step S30 may include:
step a31, executing a preset intention recognition filter chain, sequentially calling corresponding intention matchers according to a preset filter sequence to perform intention matching on the audio recognition result, stopping matching until an expected answer in a state tree of the called intention matchers is successfully matched with the audio recognition result, and outputting a first intention recognition result.
In this embodiment, when performing intent recognition on an audio recognition result obtained based on speech recognition, a specific recognition process is as follows: executing a preset intention recognition filter chain, sequentially entering filters corresponding to all intention types according to a preset filter sequence, after entering the corresponding filters, calling the corresponding intention matchers, performing intention matching on expected answers in a state tree of the called intention matchers and audio recognition results, stopping matching until the expected answers in the state tree of the intention matchers of the called intention matchers and the audio recognition results are successfully matched, and outputting intention recognition results (distinguishing the intention recognition results obtained based on a subsequent second Pinyin text, and recording the intention recognition results as first intention recognition results), namely the intention types corresponding to the intention matchers which are successfully matched.
In the embodiment, an intention recognition filter chain is dynamically constructed through a responsibility chain mode; and constructing expected answers to be matched into a state tree through a finite state machine algorithm, and performing intention matching one by one until the matching is successful to obtain an intention recognition result. The intention recognition filter chain is constructed through the responsibility chain mode and the finite state machine combination mode so as to be used for completing recognition of the intention, compared with the recognition method adopting the NLP intention recognition algorithm, the accuracy of the intention recognition result can be greatly improved, meanwhile, training samples do not need to be constructed manually, and a proper NLP intention recognition algorithm does not need to be obtained through training, so that labor cost and time cost can be greatly saved.
In addition, the intention recognition filter chain formed by combining the responsibility chain mode and the finite-state machine combination mode is an extensible filter chain, can be used for loading various intention recognition rules and carrying out dynamic intention recognition, and therefore the intention recognition requirements of being compatible with different industries and different services can be met. That is, the intention recognition method described above is more applicable.
Further, a fifth embodiment of the intention identifying method of the present invention is proposed based on the above-described first to third embodiments.
In this embodiment, after the step S30, the intention identifying method further includes:
step I, detecting whether the first intention identification result is empty;
in the present embodiment, after obtaining the first intention recognition result, it is detected whether the first intention recognition result is empty, that is, whether intention recognition is successful.
Step J, if the first intention identification result is empty, converting the audio identification result from Chinese characters into pinyin to obtain a first pinyin text;
k, obtaining similar pinyin of each pinyin in the first pinyin text, and constructing a second pinyin text according to the first pinyin text and the similar pinyin;
and step L, executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the second pinyin text to obtain a second intention recognition result.
If the first intention identification result is null, the intention identification is failed, the voice identification result is inaccurate probably due to the influence of dialects or accents, at the moment, the audio identification result can be converted into pinyin from Chinese characters to obtain a first pinyin text, further, similar pinyins of all pinyins in the first pinyin text are obtained, specifically, the similar pinyins of all pinyins in the first pinyin text can be determined and obtained according to a preset pinyin similar sound list, then, a second pinyin text is constructed and obtained according to the first pinyin text and the similar pinyins, and then, a preset intention identification filter chain is executed to call an intention matcher to carry out intention matching on the second pinyin text to obtain a second intention identification result. The specific intent recognition process is similar to the above process and is not described herein.
In the embodiment, the audio recognition result is converted from the Chinese characters into the pinyin to obtain the first pinyin text, then the corresponding similar pinyin is obtained to expand to obtain the second pinyin text, and finally the preset intention recognition filter chain is executed to perform intention recognition on the second pinyin text to obtain the second intention recognition result. By the method, the accuracy of the intention recognition result can be further improved.
The invention also provides an intention recognition device.
Referring to fig. 3, fig. 3 is a functional block diagram of a first embodiment of the identification apparatus according to the present invention.
As shown in fig. 3, the intention identifying means includes:
the file acquisition module 10 is used for acquiring an audio file to be identified;
the voice recognition module 20 is configured to perform voice recognition on the audio file to be recognized to obtain an audio recognition result;
and the intention identifying module 30 is configured to execute a preset intention identifying filter chain to call an intention matcher to perform intention matching on the audio identifying result, so as to obtain a first intention identifying result.
Further, the intention recognition apparatus further includes:
the VAD detection module is used for carrying out VAD detection on the audio file to be identified to obtain a detection result;
the first judgment module is used for judging whether the audio file to be identified is blank audio according to the detection result;
the voice recognition module 20 is specifically configured to perform voice recognition on the audio file to be recognized to obtain an audio recognition result if the audio file to be recognized is not a blank audio.
Further, the intention recognition apparatus further includes:
the second judging module is used for judging whether a silent segment exists at the beginning and/or the end of the audio file to be identified according to the detection result;
the time period determining module is used for determining a silent time period according to the detection result if the beginning and/or the end of the audio file to be identified has the silent segment;
the audio intercepting module is used for intercepting the audio file to be identified according to the silent time period to obtain a target audio file;
the speech recognition module 20 is further specifically configured to:
and carrying out voice recognition on the target recognition audio file to obtain an audio recognition result.
Further, the intention recognition apparatus further includes:
the information acquisition module is used for acquiring answer intention information, wherein the answer intention information comprises expected answers and intention types;
the construction module is used for constructing the expected answer into a state tree of a finite state machine according to a finite state machine algorithm and obtaining an intention matcher corresponding to each intention type based on the state tree;
and the assembling module is used for assembling the intention matcher according to a preset filter sequence and the intention type to obtain the preset intention recognition filter chain.
Further, the intention identifying module 30 is specifically configured to:
and executing a preset intention recognition filter chain, sequentially calling corresponding intention matchers according to a preset filter sequence to carry out intention matching on the audio recognition result, stopping matching until an expected answer in a state tree of the called intention matchers is successfully matched with the audio recognition result, and outputting a first intention recognition result.
Further, the intention recognition apparatus further includes:
a result detection module for detecting whether the first intention recognition result is null;
the result conversion module is used for converting the audio recognition result from Chinese characters into pinyin to obtain a first pinyin text if the first intention recognition result is empty;
the similar pinyin acquisition module is used for acquiring similar pinyins of all pinyins in the first pinyin text and constructing to obtain a second pinyin text according to the first pinyin text and the similar pinyins;
the intention identifying module 30 is further configured to execute a preset intention identifying filter chain, so as to call an intention matcher to perform intention matching on the second pinyin text, and obtain a second intention identifying result.
The function implementation of each module in the intention identifying device corresponds to each step in the intention identifying method embodiment, and the function and implementation process thereof are not described in detail herein.
The invention also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the intent recognition method according to any of the embodiments above.
The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the intention identifying method described above, and is not described herein again.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, carries out the steps of the intent recognition method as claimed in any one of the preceding claims.
The specific embodiment of the computer program product of the present invention is substantially the same as the embodiments of the intention identifying method described above, and will not be described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims (9)

1. An intention recognition method, characterized in that the intention recognition method comprises:
acquiring an audio file to be identified;
carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result;
and executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the audio recognition result to obtain a first intention recognition result.
2. The intention recognition method according to claim 1, wherein before the step of performing voice recognition on the audio file to be recognized to obtain an audio recognition result, the intention recognition method further comprises:
performing VAD detection on the audio file to be identified to obtain a detection result;
judging whether the audio file to be identified is blank audio according to the detection result;
if the audio file to be identified is not blank audio, executing the following steps: and carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result.
3. The intention recognition method of claim 2, wherein before the step of performing speech recognition on the audio file to be recognized to obtain an audio recognition result, the intention recognition method further comprises:
judging whether a silent segment exists at the beginning and/or the end of the audio file to be identified according to the detection result;
if the beginning and/or the end of the audio file to be identified have the silent segments, determining to obtain a silent period according to the detection result;
intercepting the audio file to be identified according to the silent time period to obtain a target audio file;
the step of performing voice recognition on the audio file to be recognized to obtain an audio recognition result comprises the following steps:
and carrying out voice recognition on the target recognition audio file to obtain an audio recognition result.
4. The intent recognition method according to any of claims 1 to 3, wherein before the step of executing a preset intent recognition filter chain to invoke an intent matcher to perform intent matching on the audio recognition result to obtain a first intent recognition result, the intent recognition method further comprises:
acquiring answer intention information, wherein the answer intention information comprises expected answers and intention types;
constructing the expected answer into a state tree of a finite state machine according to a finite state machine algorithm, and obtaining an intention matcher corresponding to each intention type based on the state tree;
and assembling the intention matcher according to a preset filter sequence and the intention type to obtain the preset intention recognition filter chain.
5. The intent recognition method according to claim 4, wherein the step of executing a preset intent recognition filter chain to invoke an intent matcher to perform intent matching on the audio recognition result to obtain a first intent recognition result comprises:
and executing a preset intention recognition filter chain, sequentially calling corresponding intention matchers according to a preset filter sequence to carry out intention matching on the audio recognition result, stopping matching until an expected answer in a state tree of the called intention matchers is successfully matched with the audio recognition result, and outputting a first intention recognition result.
6. The intention recognition method according to any one of claims 1 to 3, characterized in that the intention recognition method further comprises:
detecting whether the first intention recognition result is empty;
if the first intention identification result is empty, converting the audio identification result from the Chinese characters into pinyin to obtain a first pinyin text;
obtaining similar pinyin of each pinyin in the first pinyin text, and constructing a second pinyin text according to the first pinyin text and the similar pinyin;
and executing a preset intention recognition filter chain to call an intention matcher to perform intention matching on the second Pinyin text to obtain a second intention recognition result.
7. An intention recognition apparatus characterized by comprising:
the file acquisition module is used for acquiring an audio file to be identified;
the voice recognition module is used for carrying out voice recognition on the audio file to be recognized to obtain an audio recognition result;
and the intention identification module is used for executing a preset intention identification filter chain so as to call an intention matcher to perform intention matching on the audio identification result to obtain a first intention identification result.
8. An intention recognition device, characterized in that the intention recognition device comprises: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the intention recognition method of any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the intent recognition method of any one of claims 1 to 6.
CN202210368888.9A 2022-04-08 2022-04-08 Intention recognition method, device, equipment and storage medium Pending CN115862628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210368888.9A CN115862628A (en) 2022-04-08 2022-04-08 Intention recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210368888.9A CN115862628A (en) 2022-04-08 2022-04-08 Intention recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115862628A true CN115862628A (en) 2023-03-28

Family

ID=85660044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210368888.9A Pending CN115862628A (en) 2022-04-08 2022-04-08 Intention recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115862628A (en)

Similar Documents

Publication Publication Date Title
US7552055B2 (en) Dialog component re-use in recognition systems
CN113327609B (en) Method and apparatus for speech recognition
US20030195739A1 (en) Grammar update system and method
US20160372110A1 (en) Adapting voice input processing based on voice input characteristics
EP1650744A1 (en) Invalid command detection in speech recognition
CN109543021B (en) Intelligent robot-oriented story data processing method and system
KR20080040644A (en) Speech application instrumentation and logging
CN111326154B (en) Voice interaction method and device, storage medium and electronic equipment
CN109637536B (en) Method and device for automatically identifying semantic accuracy
WO2015188454A1 (en) Method and device for quickly accessing ivr menu
CN113987149A (en) Intelligent session method, system and storage medium for task robot
CN109460548B (en) Intelligent robot-oriented story data processing method and system
CN112087726B (en) Method and system for identifying polyphonic ringtone, electronic equipment and storage medium
CN110263346B (en) Semantic analysis method based on small sample learning, electronic equipment and storage medium
CN110047473B (en) Man-machine cooperative interaction method and system
CN110288996A (en) A kind of speech recognition equipment and audio recognition method
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
CN115862628A (en) Intention recognition method, device, equipment and storage medium
US20190279623A1 (en) Method for speech recognition dictation and correction by spelling input, system and storage medium
CN114067842B (en) Customer satisfaction degree identification method and device, storage medium and electronic equipment
CN109243449A (en) A kind of audio recognition method and system
CN115019788A (en) Voice interaction method, system, terminal equipment and storage medium
CN114860910A (en) Intelligent dialogue method and system
CN114528386A (en) Robot outbound control method, device, storage medium and terminal
CN109753659B (en) Semantic processing method, semantic processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination