CN110097886B

CN110097886B - Intention recognition method and device, storage medium and terminal

Info

Publication number: CN110097886B
Application number: CN201910356912.5A
Authority: CN
Inventors: 李杭泰
Original assignee: Guizhou Xiaoai Robot Technology Co ltd
Current assignee: Guizhou Xiaoai Robot Technology Co ltd
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2021-09-10
Anticipated expiration: 2039-04-29
Also published as: CN110097886A

Abstract

An intention identification method and device, a storage medium and a terminal are provided, and the intention identification method comprises the following steps: performing initial intention recognition on a current voice recognition result of a user, wherein the current voice recognition result is text data; when the initial intention recognition fails, determining the number of words contained in the current voice recognition result, wherein the words are the minimum units with semantics in the text data; when the number of the characters reaches a preset threshold, splitting the current voice recognition result to obtain M sentences, wherein M is a positive integer greater than 1; respectively carrying out intention identification on the M sentences to obtain N intentions, wherein N is a positive integer and is less than or equal to M; determining an intention of the current speech recognition result at least according to the N intentions. According to the technical scheme, the accuracy of intention identification can be improved.

Description

Intention recognition method and device, storage medium and terminal

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to an intention recognition method and apparatus, a storage medium, and a terminal.

Background

In the process of human-computer interaction in a voice mode, the prior art performs voice recognition on voice data input by a user by using a voice engine. And directly using all contents obtained by speech recognition as the input of a semantic understanding engine to obtain the intention of the user.

However, voice interaction is much more complex than direct text input. In a voice interaction scenario, the following situations exist: too much content (e.g., more than 20 words) for a single voice interaction by the user; the speech engine does not make a sentence break for the sentence, especially for the sentence with more than 20 characters; the user speaks intermittently, and the content picked up by the speech recognition engine once does not form a sentence. In the above three cases, that is, in the case where a long sentence, a long sentence with no break, and a content do not constitute a sentence, the intention of the user cannot be recognized, and the user experience is reduced.

Disclosure of Invention

The invention solves the technical problem of how to improve the accuracy of intention identification.

In order to solve the above technical problem, an embodiment of the present invention provides an intention identifying method, where the intention identifying method includes: performing initial intention recognition on a current voice recognition result of a user, wherein the current voice recognition result is text data; when the initial intention recognition fails, determining the number of words contained in the current voice recognition result, wherein the words are the minimum units with semantics in the text data; when the number of the characters reaches a preset threshold, splitting the current voice recognition result to obtain M sentences, wherein M is a positive integer greater than 1; respectively carrying out intention identification on the M sentences to obtain N intentions, wherein N is a positive integer and is less than or equal to M; determining an intention of the current speech recognition result at least according to the N intentions.

Optionally, N is a positive integer greater than or equal to 2, and the determining, according to at least the N intentions, an intention corresponding to the current speech recognition result includes: calculating the importance of the sentences of the N intentions; and selecting the intention of the sentence with the highest importance degree as the intention of the current voice recognition result.

Optionally, the calculating to obtain the importance of the N sentences of the N intentions includes: and respectively calculating the word frequency inverse document frequency of the N sentences to respectively serve as the importance of the N sentences.

Optionally, the determining, according to at least the N intentions, an intention corresponding to the current speech recognition result includes: determining a position of a sentence obtaining the N intentions in the current speech recognition result; and selecting the intention of the sentence with the most back position as the intention of the current voice recognition result.

Optionally, the splitting the current speech recognition result includes: and splitting the current voice recognition result by adopting a preset regular expression.

Optionally, before splitting the current speech recognition result, the method further includes: judging whether the current voice recognition result is sentence-breaking according to punctuation marks; and if the current voice recognition result is not punctuated, performing sentence breaking on the current voice recognition result by using a pre-trained sentence breaking model.

Optionally, the intention identifying method further includes: when the number of the characters does not reach a preset threshold, judging whether the number of the characters contained in a previous voice recognition result before the current voice recognition result reaches the preset threshold and whether the intention recognition is successful; if the number of the characters contained in the previous voice recognition result does not reach a preset threshold and the intention recognition fails, at least combining the current voice recognition result with the previous voice recognition result; and performing intention recognition by using the combined voice recognition result.

Optionally, the at least combining the current speech recognition result and the previous speech recognition result includes: storing the current voice recognition result to a sentence list for caching; and if the number of the recognition results in the sentence list cache is more than 1, merging all the voice recognition results in the sentence list cache.

Optionally, the intention identifying method further includes: emptying the sentence list cache if the intention recognition of the merged speech recognition result is successful; or emptying the sentence list cache if the number of the words contained in the previous voice recognition result reaches a preset threshold or the intention recognition is successful.

Optionally, the performing intent recognition by using the merged speech recognition result includes: calculating the smoothness of the combined voice recognition result; and if the smoothness reaches a preset threshold value, performing intention recognition by using the combined voice recognition result.

In order to solve the above technical problem, an embodiment of the present invention further discloses an intention identifying device, including: the system comprises an initial intention identification module, a voice recognition module and a voice recognition module, wherein the initial intention identification module is suitable for carrying out initial intention identification on a current voice recognition result of a user, and the current voice recognition result is text data; a word number determination module adapted to determine the number of words contained in the current speech recognition result when the initial intention recognition fails, a word being a minimum unit of the text data; the splitting module is suitable for splitting the current voice recognition result when the number of the characters reaches a preset threshold so as to obtain M sentences, wherein M is a positive integer greater than 1; the intention identification module is suitable for respectively carrying out intention identification on the M sentences to obtain N intentions, wherein N is a positive integer and is less than or equal to M; an intent determination module adapted to determine an intent of the current speech recognition result based at least on the N intents.

The embodiment of the invention also discloses a storage medium, wherein computer instructions are stored on the storage medium, and the computer instructions execute the steps of the intention identification method when running.

The embodiment of the invention also discloses a terminal which comprises a memory and a processor, wherein the memory is stored with computer instructions capable of running on the processor, and the processor executes the steps of the intention identification method when running the computer instructions.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

because the intention recognition aiming at the text data with the number of the contained characters being larger than the preset threshold is easy to fail, the technical scheme of the invention determines to split the current voice recognition result according to the number of the contained characters in the current voice recognition result, obtains M sentences, performs intention recognition aiming at the M sentences, and determines the intention of the current voice recognition result based on N intentions of the M sentences, so as to avoid the condition that the intention of a long sentence (namely the sentence with the number of the contained characters being larger than the preset threshold) cannot be recognized in the prior art, improve the success rate of the intention recognition of the long sentence, and further improve the interactive experience of a user.

Further, when determining the intention corresponding to the current speech recognition result according to the N intentions, determining the position of the sentence with the N intentions in the current speech recognition result; and selecting the intention of the sentence with the most back position as the intention of the current voice recognition result. In consideration of the fact that the language expression habit of the user is that important sentences are usually located at the later positions, when the final intention of the user is selected from the N intentions, the intention of the sentence with the most later position is selected, and therefore the accuracy of user intention identification is guaranteed.

Further, when the number of the words does not reach a preset threshold, judging whether the number of the words contained in a previous voice recognition result before the current voice recognition result reaches the preset threshold and whether the intention recognition is successful; if the number of the characters contained in the previous voice recognition result does not reach a preset threshold and the intention recognition fails, at least combining the current voice recognition result with the previous voice recognition result; and performing intention recognition by using the combined voice recognition result. In the technical scheme of the invention, the current voice recognition result which contains the number of the characters and fails in the intention recognition can be combined with the previous voice recognition result which contains the number of the characters and fails in the intention recognition and does not reach the preset threshold, and the intention recognition is carried out again by using the combined result so as to improve the success rate of intention recognition of short sentences (namely sentences containing the number of the characters and which does not reach the preset threshold).

Drawings

FIG. 1 is a flow chart of an intent recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart of one embodiment of step S105 shown in FIG. 1;

FIG. 3 is a flowchart of another embodiment of step S105 shown in FIG. 1;

FIG. 4 is a partial flow diagram of an intent recognition method according to an embodiment of the present invention;

FIG. 5 is a flowchart of one embodiment of step S402 shown in FIG. 4;

fig. 6 is a schematic structural diagram of an intention identifying apparatus according to an embodiment of the invention.

Detailed Description

As described in the background art, in the above three cases, that is, in the case where a long sentence, a long sentence with no break, and a content do not constitute a sentence, the intention of the user cannot be recognized, and the user experience is reduced.

The long sentence according to the embodiment of the present invention may refer to a sentence containing a number of words reaching a predetermined threshold, for example, a sentence with a number of words greater than 20.

The phrase may refer to a sentence containing words less than a predetermined threshold, for example, a sentence with words greater than or equal to 20.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a flowchart of an intention identifying method according to an embodiment of the present invention.

The method shown in fig. 1 may be suitable for any scenario requiring voice interaction, such as reception halls, shopping malls, banks, airports, etc. Specifically, the steps of the method may be performed by a human-computer interaction device, such as a virtual robot, a physical robot, and the like.

The intent recognition method shown in fig. 1 may include the steps of:

step S101: performing initial intention recognition on a current voice recognition result of a user, wherein the current voice recognition result is text data;

step S102: when the initial intention recognition fails, determining the number of words contained in the current voice recognition result, wherein the words are the minimum units with semantics in the text data;

step S103: when the number of the characters reaches a preset threshold, splitting the current voice recognition result to obtain M sentences, wherein M is a positive integer greater than 1;

step S104: respectively carrying out intention identification on the M sentences to obtain N intentions, wherein N is a positive integer and is less than or equal to M;

step S105: determining an intention of the current speech recognition result at least according to the N intentions.

It should be noted that the sequence numbers of the steps in this embodiment do not represent a limitation on the execution sequence of the steps.

Regarding the specific process of performing initial intent recognition on the current speech recognition result in step S101, reference may be made to a related intent recognition algorithm in the prior art, which is not described herein in detail in the embodiments of the present invention.

After the intention of the current speech recognition result is obtained, the answer to the intention can be determined and fed back to the user. Specifically, the question matching the intention can be searched in the knowledge base, and the answer of the matched question is used as the answer for the intention.

Those skilled in the art will appreciate that reference is made to the prior art with respect to the process of obtaining answers to questions using a knowledge base, and that further description is omitted herein.

In the prior art, intention recognition is only carried out once on a previous voice recognition result of a user, and secondary intention recognition is not carried out if the intention recognition fails.

Unlike the prior art, in the case that the initial intention recognition fails, the embodiment of the present invention may perform intention recognition again on the current speech recognition result through steps S102 to S105 to obtain the intention of the current speech recognition result.

In a specific implementation of step S102, the number of words contained in the current speech recognition result may be determined. Wherein the word is a minimum unit with semantics in the text data. For example, when the language of the text data is Chinese, the word refers to a single Chinese character; when the language of the text data is english, the word refers to a single word, and so on, the language of the text data is the other languages, which is not described herein again.

In the specific implementation of step S103, when the number of words reaches the preset threshold, it indicates that the current speech recognition result is a long sentence, and in order to accurately obtain the intention of the long sentence, that is, the current speech recognition result may be split to obtain M sentences, where M is greater than or equal to 2.

Regarding the specific splitting manner, the splitting manner may be according to punctuation marks, and the punctuation marks may specifically be periods, exclamation marks, question marks, commas, and the like, for example, a complete sentence is between two periods. Alternatively, the splitting may be performed according to semantics. For example, multiple semantics of the current speech recognition result are obtained, and a part of the obtained single semantics is a sentence.

In a specific embodiment of the present invention, a preset regular expression may be adopted to split the current speech recognition result.

It should be noted that, the specific form of the preset regular expression may refer to the prior art, and the embodiment of the present invention is not limited to this.

Further, in the implementation of step S104, the intention recognition may be performed on the M sentences, respectively. The number of the words contained in the M sentences obtained after splitting is smaller than a preset threshold, namely the M sentences are short sentences, and the success rate of intention recognition on the short sentences is higher than that on the long sentences, so that N intentions can be obtained, wherein N is smaller than or equal to M.

Those skilled in the art understand that there may be different sentence identifications to achieve the same intent, in other words, there may be at least two of the N intents that are the same.

That is, when the intention recognition of M sentences is successful, N is equal to M, in which case, there may be a one-to-one correspondence between M sentences and N intentions; if the intention recognition of at least one sentence in the M sentences fails, namely only N sentences are successfully recognized, N is smaller than M, and in this case, only the N sentences successfully recognized and N intentions are in one-to-one correspondence.

Furthermore, since M sentences are split from the current speech recognition result, N intents obtained from the M sentences can represent the intention of the current speech recognition result. In a specific implementation of step S105, the intention of the current speech recognition result may be determined at least according to the N intentions.

According to the embodiment of the invention, the current voice recognition result is determined to be split according to the number of the characters contained in the current voice recognition result, M sentences are obtained, intention recognition is carried out on the M sentences, the intention of the current voice recognition result is determined based on N intentions of the M sentences, so that the condition that the intention of a long sentence (namely the sentence containing the number of the characters larger than a preset threshold) cannot be recognized in the prior art is avoided, the intention recognition success rate of the long sentence is improved, and further the interactive experience of a user is improved.

In one non-limiting embodiment of the invention, N is a positive integer greater than or equal to 2. Referring to fig. 2, step S105 shown in fig. 1 may include the following steps:

step S201: calculating the importance of the sentences of the N intentions;

step S202: and selecting the intention of the sentence with the highest importance degree as the intention of the current voice recognition result.

The method and the device can determine the intention of the current voice recognition result according to the plurality of intentions, and the number of the intentions of the current voice recognition result is 1.

In this embodiment, the importance of the sentence may be semantic importance, which represents the importance of the sentence in the current speech recognition result. By obtaining the importance of the sentences with the N intentions, the sentence with the highest importance, which can represent the speech recognition result, in the current speech recognition result can be selected. The intention of the sentence is the intention of the current speech recognition result.

Further, step S201 shown in fig. 2 may include the following steps: and respectively calculating the word frequency inverse document frequency of the N sentences to respectively serve as the importance of the N sentences.

In the embodiment of the invention, the importance of the sentence can be represented by term frequency-inverse document frequency (TF-IDF).

In a specific implementation, when calculating the TF-IDF value of a sentence, the sentence may be segmented, the TF-IDF value is calculated for a plurality of words obtained after the segmentation, a preset number of words with the largest TF-IDF, for example, 3 words with the largest TF-IDF, is selected, an average value of the TF-IDF values of the selected words is calculated, and the average value is used as the TF-IDF value of the sentence.

In another non-limiting embodiment of the invention, N is a positive integer greater than or equal to 2. Referring to fig. 3, step S105 shown in fig. 1 may include the following steps:

step S301: determining a position of a sentence obtaining the N intentions in the current speech recognition result;

step S302: and selecting the intention of the sentence with the most back position as the intention of the current voice recognition result.

Unlike the previous embodiments, in determining the intention of the current speech recognition result, the embodiment of the present invention selects the intention of the sentence positioned most backward.

Considering that the language expression habit of the user is to place important sentences generally at the later positions, when the final intention of the user is selected from the N intentions, the intention of the sentence with the most later position is selected, so that the accuracy of the intention identification of the user is ensured.

In one non-limiting embodiment of the present invention, step S103 shown in fig. 1 may further include the following steps: judging whether the current voice recognition result is sentence-breaking according to punctuation marks; and if the current voice recognition result is not punctuated, performing sentence breaking on the current voice recognition result by using a pre-trained sentence breaking model.

As described above, when splitting the current speech recognition result, the current speech recognition result is split according to punctuation marks, so that it is required to ensure that the current speech recognition result is sentence-broken according to punctuation marks.

Under the condition that the current voice recognition result is not punctuated, the embodiment of the invention can use the sentence model to perform sentence interruption on the current voice recognition result. In particular, the sentence break model may determine the sentence in the current speech recognition result and supplement punctuation marks after the sentence.

In particular, the sentence-breaking model may be a Deep Neural Network (DNN) language model. The sentence-break model is a trained model.

It should be noted that sample data may be selected in advance to train the sentence break model, so that the trained sentence break model is obtained. The specific process of training the DNN language model may refer to the prior art, and the embodiment of the present invention is not limited thereto.

In a specific application scenario, the current speech recognition result of the user is "how your weather is really good today, i is the first time that your sun comes today, and you are not familiar with how your sun knows how your sun's railway station goes here at all. After a DNN language model is adopted to perform sentence breaking on a current voice recognition result, the following text data' hello, the weather is really good today, i is the first time you come in Guiyang today, and is not familiar with the Guiyang at all, and you know how you go at the Guiyang railway station. "since the number of words of the text data is greater than 20," the text data is split to obtain 5 phrases, which are "hello", "weather today is really good", "today is the first time i come in guiyang", "not familiar at all" and "do you know how to go at the guiyang railway station". The intention recognition is performed on the above 5 phrases, respectively, and it can be determined that the intentions of the second and fifth phrases are "weather" and "train station route", respectively. And selecting the intention of the fifth short sentence (namely the 'railway station route') as the intention of the current voice recognition result.

Further, the answer to the intention may also be looked up in the knowledge base to get the following answer "traffic route to the noble railway station: and walking 300 meters to the international ecological conference center (north) by 60 buses to the railway station, walking 350 meters to the Guizhou finance city (south) by 23 buses to the old sun gate to get off, and changing to 219 buses to the railway station, and feeding the answer back to the user, wherein the answer can be presented to the user in a voice or text mode.

In a non-limiting embodiment of the present invention, referring to fig. 4, the method shown in fig. 1 may further include the following steps:

step S401: when the number of the characters does not reach a preset threshold, judging whether the number of the characters contained in a previous voice recognition result before the current voice recognition result reaches the preset threshold and whether the intention recognition is successful;

step S402: if the number of the characters contained in the previous voice recognition result does not reach a preset threshold and the intention recognition fails, at least combining the current voice recognition result with the previous voice recognition result;

step S403: and performing intention recognition by using the combined voice recognition result.

In the embodiment of the invention, if the number of the words contained in the current voice recognition result does not reach the preset threshold, the current voice recognition result is a short sentence, and an intention recognition process aiming at the short sentence can be triggered.

In a specific implementation, it can be determined whether a previous speech recognition result before the current speech recognition result is a short sentence, and whether the intention recognition is successful. In the case where the previous speech recognition result is a short sentence and the intention recognition fails, the current speech recognition result may be merged with the previous speech recognition result. Specifically, the two short sentences may be combined into one sentence. And further, the intention recognition can be carried out on the sentences obtained after combination.

In the embodiment of the invention, the current voice recognition result which contains the number of the characters and fails in the intention recognition can be merged with the previous voice recognition result which contains the number of the characters and fails in the intention recognition and the merged result is used for carrying out the intention recognition again so as to improve the success rate of the intention recognition of short sentences (namely sentences containing the number of the characters and which does not reach the preset threshold).

In a specific application scenario, the speech recognition result of the user at time 1 is "noble" of phrase 1, after 30 seconds, the speech recognition result of the user is "today" of phrase 2, and after 30 seconds, the speech recognition result of the user is "weather" of phrase 3. In the case that the intention recognition for the short sentences 1, 2 and 3 fails, the three short sentences may be combined to obtain a sentence 4 "weather of the day of noble sun", and the intention recognition for the sentence 4 may be performed to obtain an intention of "inquiring weather of the day of noble sun".

Further, for the user's intention, the answer to the intention may be obtained by calling a third-party application, such as a weather providing application, that "the highest temperature of the noble sun today is 14 degrees, the lowest temperature is 8 degrees, the rain shower, the air pollution diffusion index, the good weather condition is favorable for the air pollution diffusion", and the answer is fed back to the user, and may be specifically presented to the user in a voice or text manner.

Referring to fig. 5, step S402 shown in fig. 4 may include the following steps:

step S501: storing the current voice recognition result to a sentence list for caching;

step S502: and if the number of the recognition results in the sentence list cache is more than 1, merging all the voice recognition results in the sentence list cache.

In this embodiment, a sentence list cache is set to store sentences (i.e., recognition results) which contain words whose number does not reach a preset threshold and are intended to be recognized unsuccessfully, such as a current speech recognition result and previous speech recognition results.

When the number of recognition results in the sentence list cache is greater than 1, a merging operation may be performed, that is, all the speech recognition results in the sentence list cache are merged into one sentence for subsequent intent recognition.

Further, the intention identifying method shown in fig. 4 may further include the following steps: emptying the sentence list cache if the intention recognition of the merged speech recognition result is successful; or emptying the sentence list cache if the number of the words contained in the previous voice recognition result reaches a preset threshold or the intention recognition is successful.

In this embodiment, the sentences in the sentence list cache are all short sentences, and are used for merging to perform intent recognition. Thus, if the intention recognition of the merged speech recognition result is successful, indicating that the intention recognition of the sentence in the sentence list cache is successful, the sentence list cache can be cleared.

In addition, if the previous speech recognition result is a long sentence, which means that the previous speech recognition result is not applicable to the intention recognition process for a short sentence, and needs to be eliminated, the recognition result in the sentence list cache becomes semantically discontinuous at this time, so that the sentence list cache can be emptied. Or, if the intention recognition of the previous speech recognition result is successful, which means that the previous speech recognition result does not need to enter the intention recognition process of the short sentence, and the previous speech recognition result also needs to be removed, the recognition result in the sentence list cache becomes discontinuous semantically at this time, and therefore the sentence list cache can be emptied.

In a specific implementation, the maximum capacity of the sentence list buffer can be configured, for example, 5 short sentences, that is, 5 speech recognition results. In this case, if the number of recognition results in the sentence list cache reaches 5 and the intention recognition fails, the sentence list cache is emptied to secure the response time of the intention recognition.

Further, step S403 shown in fig. 4 may include the following steps: calculating the smoothness of the combined voice recognition result; and if the smoothness reaches a preset threshold value, performing intention recognition by using the combined voice recognition result.

In the embodiment of the invention, the smoothness of the voice recognition result can represent the semantic consistency of the voice recognition result. The higher the degree of smoothness of the speech recognition result, the higher the probability that the speech recognition result is a complete sentence, and the higher the recognition success rate of the intention recognition on the speech recognition result.

Therefore, in order to improve the success rate of intention recognition, the merged speech recognition result is shown to have consistent semantics under the condition that the smoothness of the merged speech recognition result reaches the preset threshold value, and the merged speech recognition result can be used for intention recognition.

Specifically, the compliance may be calculated by using a DNN language model, and a specific process of calculating the compliance of the merged speech recognition result may be: and calculating the probability of the second word of the merged speech recognition result after the first word by utilizing a DNN language model from the first word of the merged speech recognition result, and calculating the probability of the third word of the merged speech recognition result after the second word by utilizing the DNN language model in the same way until all words in the merged speech recognition result are traversed and the product of all probabilities is calculated.

Referring to fig. 6, an intention identifying apparatus 60 is further disclosed in the embodiments of the present invention, and the intention identifying apparatus 60 may include an initial intention identifying module 601, a word number determining module 602, a splitting module 603, an intention identifying module 604, and an intention determining module 605.

The initial intention recognition module 601 is adapted to perform initial intention recognition on a current speech recognition result of the user, where the current speech recognition result is text data; the word number determination module 602 is adapted to determine the number of words contained in the current speech recognition result when the initial intention recognition fails, a word being a minimum unit of the text data; the splitting module 603 is adapted to split the current speech recognition result when the number of the words reaches a preset threshold, so as to obtain M sentences, where M is a positive integer greater than 1; the intention identifying module 604 is adapted to perform intention identification on the M sentences, respectively, to obtain N intentions, where N is a positive integer and is less than or equal to M; the intent determination module 605 is adapted to determine the intent of the current speech recognition result from at least the N intents.

In particular implementations, the word count determination module 602 may determine the number of words included in the current speech recognition result. Wherein the word is a minimum unit with semantics in the text data. For example, when the language of the text data is Chinese, the word refers to a single Chinese character; when the language of the text data is english, the word refers to a single word.

The specific splitting manner in the splitting module 603 may be splitting according to punctuation marks, where the punctuation marks may specifically be periods, exclamation marks, question marks, commas, and the like, for example, a complete sentence is formed between two periods. Alternatively, the splitting may be performed according to semantics. For example, multiple semantics of the current speech recognition result are obtained, and a part of the obtained single semantics is a sentence.

In a specific embodiment of the present invention, the splitting module 603 may split the current speech recognition result by using a preset regular expression.

The intention recognition module 604 may then perform intention recognition on the M sentences, respectively. The number of the words contained in the M sentences obtained after splitting is smaller than a preset threshold, namely the M sentences are short sentences, and the success rate of intention recognition on the short sentences is higher than that on the long sentences, so that N intentions can be obtained, wherein N is smaller than or equal to M.

Since the M sentences are split from the current speech recognition result, the N intentions obtained by the M sentences can represent the intention of the current speech recognition result. The intent determination module 605 may determine the intent of the current speech recognition result from at least the N intents.

In one non-limiting embodiment of the present invention, the intention determining module 605 may comprise an importance calculating unit for calculating the importance of the sentences obtaining the N intents; and the first intention selecting unit is used for selecting the intention of the sentence with the highest importance degree as the intention of the current voice recognition result.

In a specific implementation, the importance calculating unit may calculate the word frequency inverse document frequency of each of the N sentences as the importance of each of the N sentences.

Specifically, when calculating the TF-IDF value of a sentence, the importance calculating unit may first perform word segmentation on the sentence, calculate the TF-IDF value for a plurality of words obtained after the word segmentation, select a preset number of words with the largest TF-IDF, for example, 3 words with the largest TF-IDF, calculate an average value of the TF-IDF values of the selected words, and use the average value as the TF-IDF value of the sentence.

In another non-limiting embodiment of the invention, N is a positive integer greater than or equal to 2. The intent determination module 605 may include a location determination unit to determine a location of the sentence from which the N intents were obtained in the current speech recognition result; and the second intention selecting unit is used for selecting the intention of the sentence with the most back position as the intention of the current voice recognition result.

In one non-limiting embodiment of the present invention, the intention identifying means 60 may further include: the judging module is used for judging whether the current voice recognition result is sentence-breaking according to punctuation marks; and the sentence breaking module is used for breaking the current voice recognition result by utilizing a pre-trained sentence breaking model when the current voice recognition result is not subjected to sentence breaking according to punctuation marks.

In one non-limiting embodiment of the present invention, the intention identifying means 60 may further include: the phrase judgment module is used for judging whether the number of the characters contained in the previous voice recognition result before the current voice recognition result reaches a preset threshold and whether the intention recognition is successful or not when the number of the characters does not reach the preset threshold; a merging module, configured to merge at least the current speech recognition result and the previous speech recognition result when the number of words included in the previous speech recognition result does not reach a preset threshold and the intention recognition fails; and the recognition module is used for performing intention recognition by utilizing the combined voice recognition result.

Further, the merging module may include: the storage unit is used for storing the current voice recognition result to a sentence list for caching; a merging unit, configured to merge all the speech recognition results in the sentence list cache when the number of recognition results in the sentence list cache is greater than 1.

Further, the intention recognition device 60 shown in fig. 6 may further include a clearing module, configured to clear the sentence list cache when the intention recognition on the combined speech recognition result is successful; or emptying the sentence list cache when the number of the words contained in the previous voice recognition result reaches a preset threshold or the intention recognition is successful.

In a particular embodiment, the identification module may include: a smoothness calculation unit for calculating the smoothness of the combined voice recognition result; and the recognition unit is used for performing intention recognition by utilizing the combined voice recognition result when the smoothness reaches a preset threshold value.

For more details of the operation principle and the operation manner of the intention identifying apparatus 60, reference may be made to the related descriptions in fig. 1 to 5, which are not described herein again.

The embodiment of the invention also discloses a storage medium, wherein computer instructions are stored on the storage medium, and when the computer instructions are operated, the steps of the method shown in the figures 1 to 5 can be executed. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.

The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with computer instructions capable of running on the processor. The processor, when executing the computer instructions, may perform the steps of the methods shown in fig. 1-5. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An intent recognition method, comprising:

performing initial intention recognition on a current voice recognition result of a user, wherein the current voice recognition result is text data;

when the initial intention recognition fails, determining the number of words contained in the current voice recognition result, wherein the words are the minimum units with semantics in the text data;

when the number of the characters reaches a preset threshold, splitting the current voice recognition result to obtain M sentences, wherein M is a positive integer greater than 1;

respectively carrying out intention identification on the M sentences to obtain N intentions, wherein N is a positive integer and is less than or equal to M;

determining an intention of the current speech recognition result at least according to the N intentions;

when the number of the characters does not reach a preset threshold, judging whether the number of the characters contained in a previous voice recognition result before the current voice recognition result reaches the preset threshold and whether the intention recognition is successful; if the number of words contained in the previous voice recognition result does not reach a preset threshold and the intention recognition fails, storing the current voice recognition result into a sentence list cache, and if the number of the recognition results in the sentence list cache is greater than 1, merging all the voice recognition results in the sentence list cache;

and performing intention recognition by using the combined voice recognition result.

2. The method according to claim 1, wherein N is a positive integer greater than or equal to 2, and the determining the intention corresponding to the current speech recognition result according to at least the N intentions comprises:

calculating the importance of the sentences of the N intentions;

and selecting the intention of the sentence with the highest importance degree as the intention of the current voice recognition result.

3. The method according to claim 2, wherein the calculating the importance of the N sentences of the N intentions comprises:

and respectively calculating the word frequency inverse document frequency of the N sentences to respectively serve as the importance of the N sentences.

4. The method according to claim 1, wherein the determining the intention corresponding to the current speech recognition result according to at least the N intentions comprises:

determining a position of a sentence obtaining the N intentions in the current speech recognition result;

and selecting the intention of the sentence with the most back position as the intention of the current voice recognition result.

5. The intent recognition method of claim 1, wherein the splitting the current speech recognition result comprises:

and splitting the current voice recognition result by adopting a preset regular expression.

6. The intent recognition method according to claim 1, wherein the splitting of the current speech recognition result further comprises:

judging whether the current voice recognition result is sentence-breaking according to punctuation marks;

and if the current voice recognition result is not punctuated, performing sentence breaking on the current voice recognition result by using a pre-trained sentence breaking model.

7. The intention recognition method according to claim 1, further comprising:

emptying the sentence list cache if the intention recognition of the merged speech recognition result is successful; or emptying the sentence list cache if the number of the words contained in the previous voice recognition result reaches a preset threshold or the intention recognition is successful.

8. The intent recognition method according to claim 1, wherein the performing intent recognition using the merged speech recognition result comprises:

calculating the smoothness of the combined voice recognition result;

and if the smoothness reaches a preset threshold value, performing intention recognition by using the combined voice recognition result.

9. An intention recognition apparatus, comprising:

the system comprises an initial intention identification module, a voice recognition module and a voice recognition module, wherein the initial intention identification module is suitable for carrying out initial intention identification on a current voice recognition result of a user, and the current voice recognition result is text data;

the word number determining module is suitable for determining the number of words contained in the current voice recognition result when the initial intention recognition fails, wherein the words are the minimum units with semantics in the text data;

the splitting module is suitable for splitting the current voice recognition result when the number of the characters reaches a preset threshold so as to obtain M sentences, wherein M is a positive integer greater than 1;

the intention identification module is suitable for respectively carrying out intention identification on the M sentences to obtain N intentions, wherein N is a positive integer and is less than or equal to M;

an intent determination module adapted to determine an intent of the current speech recognition result based at least on the N intents;

the phrase judgment module is used for judging whether the number of the characters contained in the previous voice recognition result before the current voice recognition result reaches a preset threshold and whether the intention recognition is successful or not when the number of the characters does not reach the preset threshold;

a merging module, configured to store the current speech recognition result in a sentence list cache when the number of words included in the previous speech recognition result does not reach a preset threshold and the intention recognition fails, and merge all speech recognition results in the sentence list cache when the number of recognition results in the sentence list cache is greater than 1;

and the recognition module is used for performing intention recognition by utilizing the combined voice recognition result.

10. A storage medium having stored thereon computer instructions, wherein said computer instructions when executed perform the steps of the intent recognition method of any of claims 1-8.

11. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the intent recognition method of any of claims 1-8.