WO2021174717A1 - 文本意图识别方法、装置、计算机设备和存储介质 - Google Patents

文本意图识别方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021174717A1
WO2021174717A1 PCT/CN2020/097006 CN2020097006W WO2021174717A1 WO 2021174717 A1 WO2021174717 A1 WO 2021174717A1 CN 2020097006 W CN2020097006 W CN 2020097006W WO 2021174717 A1 WO2021174717 A1 WO 2021174717A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
intent
preset
processed
similarity
Prior art date
Application number
PCT/CN2020/097006
Other languages
English (en)
French (fr)
Inventor
辛亮亮
倪合强
白云
潘影波
孙强
Original Assignee
苏宁易购集团股份有限公司
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司, 苏宁云计算有限公司 filed Critical 苏宁易购集团股份有限公司
Priority to CA3174601A priority Critical patent/CA3174601C/en
Publication of WO2021174717A1 publication Critical patent/WO2021174717A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • This application relates to the field of text processing technology, and in particular to a method, device, computer equipment and storage medium for text intent recognition.
  • the intent corresponding to the text content can be determined according to the text content.
  • classification methods are generally used to divide sentences into corresponding intention types.
  • NLU Natural Language Processing, natural speech understanding
  • NLU Natural Language Processing, natural speech understanding
  • the traditional way is to use an algorithm to extract the intention of text content.
  • annotated corpus in a uniform format is input into an algorithm, and the intent of the text content is determined by comparing the confidence level output by the algorithm or the classification result.
  • the labeled corpus when the labeled corpus is insufficient, use the labeled corpus to train the algorithm used to determine the text content intent, and finally determine the text content intent based on the trained algorithm, due to the insufficient labeled corpus, the final recognized text content intent will be accurate The sex is low.
  • a method for recognizing text intentions comprising: obtaining a text to be processed; inputting the text to be processed into a text classification model to obtain similar corpus of the text to be processed output by the text classification model and the first similarity between the similar corpus and the text to be processed Degree, the text classification model is trained based on the corpus with annotated intents; the first candidate intent of the text to be processed is determined based on similar corpora; the entity information of the text to be processed is extracted, and the second candidate intent of the text to be processed is obtained according to the entity information; The second degree of similarity between the entity information and the text to be processed; the final intent of the text to be processed is screened from the first candidate intent and the second candidate intent according to the first similarity and the second similarity.
  • the text intention recognition method when the first similarity is greater than the first preset value and less than the second preset value, enter the step of extracting entity information of the text to be processed, and obtaining the second candidate intent of the text to be processed according to the entity information ,
  • the first preset value is less than the second preset value;
  • the text intention recognition method further includes: when the first similarity is greater than or equal to the second preset value, taking the first candidate intention as the final intention; and/or When a similarity is less than or equal to the first preset value, a prompt message is generated.
  • extracting entity information of the text to be processed includes: acquiring a plurality of preset word types, each of which is associated with a first preset intention; acquiring a word search algorithm corresponding to each preset word type, The word search algorithm is used to find the words corresponding to each preset word type; according to the word search algorithm corresponding to each preset word type, the words corresponding to each preset word type are extracted from the text to be processed, and multiple first words of the text to be processed are obtained.
  • Target words generating entity information based on multiple first target words.
  • obtaining the second candidate intent of the text to be processed according to the entity information includes: obtaining a preset intent set, the preset intent set includes a plurality of second preset intents, and each second preset intent is associated with a plurality of Preset words; obtain a plurality of first target words in the entity information; filter out the target intentions from the preset intention set according to the plurality of first target words and the preset words associated with each second preset intention in the preset intention set, according to The target intent determines the second candidate intent.
  • filtering out the target intent from the preset intent set according to a plurality of first target words and the preset words associated with each second preset intent in the preset intent set includes: obtaining preset keywords; When the preset keywords are included in the first target words, the preset keywords are matched with the preset words associated with each second preset intention.
  • the first target sub-candidate intent is screened out from the intent, and the first target sub-candidate intent is regarded as the target intent; when the preset keywords are not included in the plurality of first target words, the first target words are separated with each of the first target words.
  • obtaining the second similarity between the entity information and the text to be processed includes: obtaining the first sub-similarity between the first target word corresponding to the target intent and the text to be processed; when there are multiple target intents, There are multiple first sub similarities, and the first sub similarity with the highest similarity among the multiple first sub similarities is regarded as the second similarity; when the target intention is one, the first sub similarity is regarded as the second similarity Degree; screening the final intent of the text to be processed in the first candidate intent and the second candidate intent according to the first similarity and the second similarity, including: when the first similarity is greater than or equal to the second similarity, the first The candidate intent is used as the final intent of the text to be processed; when the first similarity is less than the second similarity and the second candidate intent contains multiple target intents, the target intent corresponding to the second similarity is used as the final intent of the text to be processed; When the first degree of similarity is less than the second degree of similarity and the second candidate intent includes a target intent,
  • acquiring the second degree of similarity between the entity information and the text to be processed includes: segmenting the text to be processed to obtain multiple second target words of the text to be processed; and acquiring the first number of the first target words And the second number of the second target words; the ratio of the first number to the second number is obtained, and the second similarity is determined according to the ratio.
  • a text intent recognition device which comprises: a first acquisition module for acquiring text to be processed; a second acquisition module for inputting the text to be processed into a text classification model to obtain similarity of the text to be processed output by the text classification model The first similarity between the corpus and the similar corpus and the text to be processed, the text classification model is trained based on the corpus of the marked intent; the first determination module is used to determine the first candidate intent of the text to be processed based on the similar corpus; The third acquisition module is used to extract entity information of the text to be processed, and the second candidate intent of the text to be processed is obtained according to the entity information; the fourth acquisition module is used to acquire the second degree of similarity between the entity information and the text to be processed; second determination The module is used to screen the final intent of the text to be processed among the first candidate intent and the second candidate intent according to the first degree of similarity and the second degree of similarity.
  • a computer device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor.
  • the processor implements the steps of any of the above-mentioned embodiments when the computer program is executed.
  • the text to be processed is first input into the text classification model to obtain the similar corpus and the first degree of similarity between the similar corpus and the text to be processed.
  • the first intention of the text to be processed is determined based on similar corpus.
  • the entity information of the text to be processed is extracted, and the second intention of the text to be processed is obtained according to the entity information.
  • the second degree of similarity between the entity information and the text to be processed is obtained.
  • the intention of the text to be processed is determined according to the first degree of similarity and the second degree of similarity, and the intention of the text to be processed is the first intention or the second intention.
  • the first intention and the second intention of the text to be processed are respectively determined through the text classification model and the entity information of the text to be processed, and the final intention of the text to be processed is determined as the first intention according to the similarity between the text classification model and the text to be processed.
  • One intent or second intent can be used to recognize the intent of the text to be processed in multiple ways, avoiding the low accuracy of the intent recognition of the text to be processed due to insufficient annotation corpus when a single text classification model is used to determine the intent of the text to be processed , Improve the accuracy of text content intention recognition.
  • Fig. 1 is an application environment diagram of a method for text intention recognition in an embodiment
  • FIG. 2 is a schematic flowchart of a method for recognizing text intent in an embodiment
  • FIG. 3 is a schematic flowchart of a method for recognizing text intent in another embodiment
  • Figure 4 is a schematic flow chart of S108 in an embodiment
  • FIG. 5 is a schematic flowchart of S108 in another embodiment
  • Figure 6 is a schematic flow chart of S1085 in an embodiment
  • FIG. 7 is a schematic flowchart of S110 in an embodiment
  • FIG. 8 is a structural block diagram of a text intention recognition device in an embodiment
  • Fig. 9 is an internal structure diagram of a computer device in an embodiment.
  • the text intent recognition method provided by this application is applied in the application environment as shown in FIG. 1.
  • Users can interact with the corresponding service platform through various applications on the terminal.
  • the user can send the text of the question and answer type to the corresponding service platform through the application on the terminal to receive the reply information issued by the service platform.
  • the client server is a server supporting the service platform.
  • the service platform receives the text of the question and answer type sent by the user through the client server, that is, the text to be processed is received.
  • the text to be processed is input into the text classification model, and the similar corpus of the text to be processed output by the text classification model and the first similarity between the similar corpus and the text to be processed are obtained.
  • the first candidate intent of the text to be processed is determined based on similar corpus.
  • the service platform extracts the entity information of the text to be processed, obtains the second candidate intent of the text to be processed according to the entity information, and obtains the second degree of similarity between the entity information and the text to be processed.
  • the final intent of the text to be processed is screened among the first candidate intent and the second candidate intent.
  • the final intention is the intention corresponding to the question and answer type text sent by the user.
  • the service platform reads the corresponding reply answer according to the obtained intention, and sends the reply answer to the user's terminal.
  • the terminal here may be a hardware device such as a computer, a tablet computer, and a smart phone.
  • the client server can be implemented by a single server or a server cluster composed of multiple servers.
  • a method for text intent recognition is provided.
  • the method is applied to the service platform (specifically, a client server supporting the service platform) in FIG. 1 as an example for description, including the following steps :
  • the user sends a question-and-answer text message to the service platform through the terminal.
  • the service platform receives the text information of the question and answer type sent by the user, and uses the text information as the text to be processed.
  • the text to be processed is used to characterize the user's intention, and the user's intention can be obtained by performing intention recognition on the text to be processed.
  • the text to be processed may be a text indicating the intent of the user to consult, such as "return application has been submitted", "the mobile phone I bought is broken", "where is my goods", and so on.
  • S104 Input the to-be-processed text into the text classification model to obtain similar corpus of the to-be-processed text output by the text classification model and the first similarity between the similar corpus and the to-be-processed text, and the text classification model is trained according to the corpus of the marked intent .
  • the service platform after the service platform obtains the to-be-processed text, it inputs the to-be-processed text into the text classification model.
  • the text classification model has been trained on the corpus with annotated intent.
  • the text classification model is used to recognize the text to be processed according to the corpus that has been marked with intent, and output candidate similar corpus similar to the text to be processed and the similarity between the candidate similar corpus and the text to be processed.
  • the candidate similar corpus can be one or more.
  • the similarity between the candidate similar corpus and the text to be processed can also be one or more.
  • the candidate similar corpus with the highest similarity is selected as the similar corpus of the text to be processed, and the highest similarity is the first similarity between the similar corpus and the text to be processed.
  • the text classification model may be a Text-CNN model (text convolution model).
  • the question-and-answer corpus (the corpus that has been labeled with the intention) labeled with the sentence dimension can be subjected to operations such as removing stop words for model training. For example, remove useless words such as modal particles such as?
  • the to-be-processed text is subjected to the operation of removing stop words, and then the to-be-processed text after removing the stop words is input into the trained text classification model to obtain
  • S106 Determine the first candidate intent of the text to be processed according to the similar corpus.
  • the service platform determines the similar corpus of the text to be processed according to the text classification model
  • the user intent corresponding to the similar corpus is acquired, and the user intent is taken as the first candidate intent of the text to be processed.
  • a plurality of intent-labeled corpora are stored in the service platform.
  • the similar corpus output when the to-be-processed text is input into the text classification model has an intent-labeled intent.
  • the first candidate intent of the text to be processed can be determined.
  • the service platform obtains the similar corpus, it obtains the corresponding standard corpus based on the similar corpus, and then determines the first candidate intent of the text to be processed based on the standard corpus.
  • the standard corpus has been marked with intent. According to the standard question, the first candidate intent of the text to be processed can be determined.
  • the service platform stores question-and-answer corpus that has been marked with sentence dimensions (marked intent), such as after-sales type corpus.
  • sentence dimensions marked intent
  • after-sales type corpus the standard questions (question and answer corpus marked with sentence dimensions) and similar questions are as follows:
  • the intent field corresponds to the standard question
  • the text field corresponds to the similar question.
  • S108 Extract entity information of the text to be processed, and obtain a second candidate intent of the text to be processed according to the entity information.
  • the service platform extracts entity information of the text to be processed.
  • the entity information may be information formed by word segmentation in the text to be processed.
  • the entity information includes category words, brand words, hot words, and keywords.
  • the entity information may also be entity information determined according to the text content of the text to be processed.
  • the semantics of the text to be processed is determined according to the text content of the text to be processed, and the semantics of the text to be processed is used as the entity information.
  • the service platform obtains the second candidate intent of the text to be processed according to the entity information.
  • the service platform contains multiple preset intentions, and each preset intention corresponds to related information. According to the matching relationship between the entity information and the associated information of each preset intention, the second candidate intention for processing the text can be determined.
  • the second degree of similarity may be the degree of semantic similarity between the entity information and the text to be processed.
  • the second similarity can also be determined according to the ratio between the one or more word segments and the text to be processed.
  • the second degree of similarity characterizes the degree of similarity between the entity information and the text to be processed.
  • S112 Screen the final intent of the text to be processed from the first candidate intent and the second candidate intent according to the first degree of similarity and the second degree of similarity.
  • the service platform determines the first similarity and the first candidate intent of the text to be processed according to the text classification model, and determines the second similarity and the second candidate intent of the text to be processed according to the entity information of the text to be processed , And then filter the final intent of the text to be processed among the first candidate intent and the second candidate intent according to the first similarity and the second similarity.
  • the first degree of similarity is greater than or equal to the second degree of similarity
  • the final intention is the first candidate intention.
  • the first degree of similarity is less than the second degree of similarity
  • the final intention is the second candidate intention.
  • the candidate intent corresponding to the maximum similarity is regarded as the final intent of the text to be processed, so that the final intent of the text to be processed is more accurate, and a single method is avoided to determine the intent to be processed.
  • the intent of the text leads to low accuracy of intent recognition.
  • the text to be processed is first input into the text classification model to obtain the similar corpus and the first similarity between the similar corpus and the text to be processed.
  • the first intention of the text to be processed is determined based on similar corpus.
  • the entity information of the text to be processed is extracted, and the second intention of the text to be processed is obtained according to the entity information.
  • the second degree of similarity between the entity information and the text to be processed is obtained.
  • the intention of the text to be processed is determined according to the first degree of similarity and the second degree of similarity, and the intention of the text to be processed is the first intention or the second intention.
  • the first intention and the second intention of the text to be processed are respectively determined through the text classification model and the entity information of the text to be processed, and the final intention of the text to be processed is determined as the first intention according to the similarity between the text classification model and the text to be processed.
  • One intent or second intent can be used to recognize the intent of the text to be processed in multiple ways, avoiding the low accuracy of the intent recognition of the text to be processed due to insufficient annotation corpus when a single text classification model is used to determine the intent of the text to be processed , Improve the accuracy of text content intention recognition.
  • the service platform before entering step S108, sets preconditions.
  • the precondition is that the first similarity is greater than the first preset value and less than the second preset value. Wherein, the first preset value is less than the second preset value.
  • step S108 is entered.
  • the preconditions are not met, there are two cases. In case 1, see step S1074: when the first similarity is greater than or equal to the second preset value, the first candidate intention is taken as the final intention. In case 2, see step S1072: when the first similarity is less than or equal to the first preset value, a prompt message is generated.
  • the text to be processed after the stop words are removed is classified and recognized using the trained text classification model, and the candidate similar corpus output by the model and the similarity between the candidate similar corpus and the text to be processed are obtained.
  • the candidate similar corpora there are multiple candidate similar corpora, and there are multiple similarities between the candidate similar corpus and the text to be processed, and the candidate similar corpora is sorted according to the magnitude of the similarity.
  • the service platform obtains the candidate similar corpus with the highest similarity, and if the candidate similar corpus with the highest similarity corresponds to a similarity greater than or equal to a second preset value (such as 95%), directly according to the intent corresponding to the candidate similar corpus As the first candidate intention, the procedure is terminated at this time, and step S108 does not need to be executed again. If the corresponding similarity of the candidate similar corpus with the highest similarity is greater than a first preset value (for example, 60%) and less than a second preset value, step S108 is executed.
  • a first preset value for example, 60%
  • step S108 If the corresponding similarity of the candidate similar corpus with the highest similarity is less than or equal to the first preset value, a prompt message is generated, and step S108 does not need to be executed at this time. Therefore, the ability of the service platform to recognize the intention of the text to be processed can be improved.
  • step S108 includes:
  • S1082 Acquire a plurality of preset word types, and each preset word type is associated with a first preset intention.
  • S1084 Obtain a word search algorithm corresponding to each preset word type, and the word search algorithm is used to search for words corresponding to each preset word type.
  • S1086 Extract words corresponding to each preset word type from the text to be processed according to the word search algorithm corresponding to each preset word type to obtain multiple first target words of the text to be processed.
  • S1088 Generate entity information according to the multiple first target words.
  • a plurality of preset word types are preset in the service platform, and each preset word type is associated with a corresponding first preset intention.
  • multiple preset word types include category words, hot words, brand words, and keywords.
  • Category words correspond to one or more first preset intentions
  • hot words correspond to one or more first preset intentions
  • brand words correspond to one or more first preset intentions
  • keywords correspond to one or more The first preset intention.
  • the word search algorithm corresponding to each preset word type is used to find words corresponding to each preset word type.
  • the service platform extracts words corresponding to each preset word type from the text to be processed according to the word search algorithm corresponding to each preset word type, and obtains a plurality of first target words of the text to be processed.
  • the word search algorithm corresponding to each preset word type may be the same word search algorithm.
  • the word search algorithm may be a dictionary tree search algorithm.
  • entity information is generated according to the multiple first target words.
  • the entity information may include multiple first target words, or may be other information that does not include the first target words generated based on the multiple first target words. Therefore, the ability of the service platform to extract the entity information of the text to be processed can be improved.
  • the text to be processed is segmented, and the word-dimension corpus is used to pick up NER (Named Object Entity) for the segmented result to obtain the entity information in the text to be processed.
  • Entity information can include category, brand, hot words, keywords, etc.
  • step S108 further includes:
  • S1081 Obtain a preset intent set, the preset intent set includes a plurality of second preset intents, and each second preset intent is associated with a plurality of preset words.
  • S1085 Filter out the target intention from the preset intention set according to the plurality of first target words and the preset words associated with each second preset intention in the preset intention set, and determine the second candidate intention according to the target intention.
  • the service platform is preset with a preset intent set.
  • the preset intention set includes a plurality of second preset intentions, and each second preset intention is associated with a plurality of preset words.
  • the second preset intention is a purchase intention
  • the associated preset words may include "buy", “buy”, “sell”, and so on.
  • the second preset intention is after-sales intention
  • the associated preset words may include "sold" and "broken”.
  • the target intentions can be filtered from the preset intention set according to the plurality of first target words and the preset words associated with each second preset intention.
  • the target intention can be one or more.
  • the service platform can determine the second candidate intent. Therefore, the target intention is filtered from the preset intention set through the multiple first target words in the entity information, and then the second candidate intention is determined according to the target intention, so that the service platform can quickly obtain the second candidate intention.
  • step S1085 includes:
  • S10854 When the preset keywords are included in the plurality of first target words, perform word matching between the preset keywords and the preset words associated with each second preset intent, and according to the result of the word matching, select the preset keywords from the set of preset intents.
  • the first target sub-candidate intent is screened out of the second preset intents, and the first target sub-candidate intent is used as the target intent.
  • S10856 When the preset keywords are not included in the plurality of first target words, the plurality of first target words are respectively matched with the preset words associated with each second preset intention, and the preset words are selected from the preset words according to the result of the word matching.
  • the second target sub-candidate intent is screened out from the plurality of second preset intents in the intent set, and the second target sub-candidate intent is used as the target intent.
  • the service platform is set with preset keywords.
  • the preset keywords can be set according to the intention of the current activity, or set according to the intention of the user that can be recognized by the system. According to the preset keywords, the user's intention can be directly identified. Further, a plurality of first target words are extracted from the text to be processed, the preset keywords are matched and recognized with the plurality of first target words, and it is determined whether the plurality of first target words contain the preset keywords.
  • the first target sub-candidate intent is used as the target intent. Therefore, there is no need to match all the first target words with the preset words associated with each second preset intention, which saves some calculation work of the service platform and improves the efficiency of the service platform's intention recognition of the text to be processed. If not, the multiple first target words are respectively matched with the preset words associated with each second preset intent, and the second preset intents in the preset intent set are filtered out according to the result of word matching.
  • the target sub-candidate intent, and the second target sub-candidate intent is used as the target intent.
  • the first target word when the first target word is matched with the preset word associated with the second preset intention to filter out the second target sub-candidate intention, the first target word may correspond to one or more second target sub-candidate intentions.
  • the preset culling words are culled from the plurality of first target words to obtain a plurality of target words.
  • the target word is matched with the preset words associated with each second preset intention, and the first target sub-candidate intention is selected from the plurality of second preset intentions in the preset intention set according to the result of the word matching, and the first target The sub-candidate intent serves as the target intent.
  • the service platform may preset a plurality of preset excluded words for word screening of the plurality of first target words.
  • a plurality of first target words contain preset eliminated words
  • the preset eliminated words in the plurality of first target words are eliminated, and the remaining first target words are used for the presets associated with each second preset intention
  • the words are matched with words, and finally the first target sub-candidate intention is screened out from the plurality of second preset intentions in the preset intention set according to the result of the word matching.
  • step S110 includes:
  • S1102 Acquire the first sub-similarity between the first target word corresponding to the target intention and the text to be processed.
  • S1104 When there are multiple target intentions, there are multiple first sub-similarity degrees, and the first sub-similarity degree with the highest similarity among the multiple first sub-similarity degrees is used as the second similarity degree.
  • step S112 includes:
  • the service platform can provide ways to recognize the intent of the text to be processed in various situations, and improve the ability of recognizing the intent of the text to be processed.
  • step S110 includes: segmenting the text to be processed to obtain multiple second target words of the text to be processed; obtaining the first number of first target words and the second number of second target words; obtaining the first number of target words The ratio of one quantity to the second quantity, and the second degree of similarity is determined according to the ratio.
  • the text to be processed is subjected to word segmentation processing to obtain multiple second word segmentation. Further, the second number of the second word segmentation of the text to be processed is acquired, and the first number of the first word segmentation in the entity information is acquired, and the ratio of the first number to the second number is acquired. This ratio is regarded as the second degree of similarity.
  • the text to be processed is "The mobile phone I bought is broken", and the entity information is "Buy” and "Broken".
  • the post-sale type corpus includes the corpus "The air conditioner I just bought is broken".
  • TriTree dictionary tree
  • corresponding models are saved respectively.
  • keywords with particularly obvious intentions such as buying, broken, and activities, as well as corresponding category words such as mobile phones, telephones, refrigerators, and air conditioners, which are used for NER picking up the text to be processed.
  • the similarity algorithm of purchase intention can convert “the phone I bought is broken” into the format of the word vector by calculating the cosine similarity of the word vector: "buy (keyword)", “mobile phone (category word)” and After removing the stop words, the word vectors "I”, “just bought”, “mobile phone” and “broken” of the to-be-processed text are compared, and it can be obtained that the similarity of the to-be-processed text under the purchase intention is 53%.
  • the Text-CNN model obtained by using sentence dimension annotation corpus for the text to be processed is predicted, and the similarity of the after-sales intent is 80%. Therefore, the intent of the text to be processed can be obtained as the after-sales intent.
  • the similar question is: I just bought The air conditioner is broken, the knowledge point corresponding to the similar question is: after-sales maintenance.
  • this application solves the problem of obtaining the user's final intention under the condition that the sentence dimension annotation corpus is insufficient, and the word dimension annotation corpus and the sentence dimension annotation corpus are used at the same time to obtain the user's final intention, thereby avoiding the user's intention recognition when the sentence dimension annotation corpus is insufficient
  • the problem of low accuracy solves the problem of obtaining the user's final intention under the condition that the sentence dimension annotation corpus is insufficient, and the word dimension annotation corpus and the sentence dimension annotation corpus are used at the same time to obtain the user's final intention, thereby avoiding the user's intention recognition when the sentence dimension annotation corpus is insufficient.
  • the present application also provides a text intent recognition device. As shown in FIG. 8, the device includes a first acquiring module 10, a second acquiring module 20, a first determining module 30, a third acquiring module 40, a fourth acquiring module 50, and The second determination module 60.
  • the first obtaining module 10 is used to obtain the text to be processed; the second obtaining module 20 is used to input the text to be processed into the text classification model to obtain the similar corpus of the text to be processed output by the text classification model and the difference between the similar corpus and the text to be processed
  • the text classification model is trained according to the corpus of annotated intentions; the first determination module 30 is used to determine the first candidate intention of the text to be processed according to the similar corpus; the third acquisition module 40 is used to extract The entity information of the text to be processed is used to obtain the second candidate intention of the text to be processed according to the entity information; the fourth obtaining module 50 is used to obtain the second similarity between the entity information and the text to be processed; the second determining module 60 is used to obtain The first degree of similarity and the second degree of similarity filter the final intent of the text to be processed among the first candidate intent and the second candidate intent.
  • the extraction operation of the third acquisition module 40 is implemented, where the first preset value is less than the second preset value;
  • the text intention recognition device further includes (not shown in FIG. 8): a third determination module, configured to use the first candidate intention as the final intention when the first similarity is greater than or equal to the second preset value; and/or prompt The module is used to generate prompt information when the first similarity is less than or equal to the first preset value.
  • the third acquiring module 40 includes (not shown in FIG. 8): a first acquiring unit configured to acquire a plurality of preset word types, each of which is associated with a first preset intention; The second acquisition unit is used to obtain the word search algorithm corresponding to each preset word type, and the word search algorithm is used to find the word corresponding to each preset word type; the extraction unit is used to obtain the word search algorithm corresponding to each preset word type from The words corresponding to each preset word type are extracted from the text to be processed to obtain multiple first target words of the text to be processed; the generating unit is configured to generate entity information according to the multiple first target words.
  • the third acquiring module 40 includes (not shown in FIG. 8): a third acquiring unit configured to acquire a preset intent set, the preset intent set includes a plurality of second preset intents, each second The preset intent is associated with multiple preset words; the fourth obtaining unit is used to obtain multiple first target words in the entity information; the screening unit is used to set each second preset word according to the multiple first target words and the preset intent.
  • the preset words associated with intentions filter out the target intentions from the preset intention set, and determine the second candidate intentions according to the target intentions.
  • the screening unit includes: a first obtaining subunit, used to obtain preset keywords; It is assumed that the keywords are matched with the preset words associated with each second preset intention, and the first target sub-candidate intention is selected from the plurality of second preset intentions in the preset intention set according to the result of the word matching, and the first target The sub-candidate intent is used as the target intent; the second screening subunit is used to associate the plurality of first target words with each of the second preset intents when the preset keywords are not included in the plurality of first target words The words are matched with words, and the second target sub-candidate intents are selected from the plurality of second preset intents in the preset intent set according to the result of the word matching, and the second target sub-candidate intents are used as the target intent.
  • the fourth obtaining module 50 includes (not shown in FIG. 8): a fifth obtaining unit, configured to obtain the first sub-similarity between the first target word corresponding to the target intent and the text to be processed; The determining unit is used for when there are multiple target intentions, the first sub-similarity is multiple, and the first sub-similarity with the highest similarity among the multiple first sub-similarity is regarded as the second similarity; the second determining unit , When the target intention is one, use the first sub-similarity as the second similarity; the second determining module 60 includes: a third determining unit, used for when the first similarity is greater than or equal to the second similarity, The first candidate intent is regarded as the final intent of the text to be processed; the fourth determining unit is used for when the first similarity is less than the second similarity and the second candidate intent contains multiple target intents, the second similarity is corresponding to the The target intent is used as the final intent of the text to be processed; the fifth determining unit is used to take the target intent
  • the fourth acquisition module 50 includes (not shown in FIG. 8): a word segmentation unit, configured to segment the text to be processed to obtain multiple second target words of the text to be processed; and a sixth acquisition unit , For obtaining the first number of the first target words and the second number of the second target words; the sixth determining unit, for obtaining the ratio of the first number to the second number, and determining the second degree of similarity according to the ratio.
  • Each module in the above-mentioned text intention recognition device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device may be a client server supporting the operation of a service platform, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to connect with an external terminal to read the text to be processed on the terminal.
  • the computer program is executed by the processor to realize a method for locating interface elements.
  • FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:
  • Obtain the text to be processed input the text to be processed into the text classification model to obtain the similar corpus of the text to be processed output by the text classification model and the first similarity between the similar corpus and the text to be processed.
  • the text classification model is based on the marked intent Corpus training; determine the first candidate intent of the text to be processed based on similar corpus; extract the entity information of the text to be processed, obtain the second candidate intent of the text to be processed based on the entity information; obtain the second degree of similarity between the entity information and the text to be processed ; Filter the final intent of the text to be processed in the first candidate intent and the second candidate intent according to the first degree of similarity and the second degree of similarity.
  • the processor executes the computer program to extract the entity information of the text to be processed, and obtains the first degree of the text to be processed according to the entity information.
  • the first preset value is less than the second preset value; at this time, when the processor executes the computer program, the following step is also implemented: when the first similarity is greater than or equal to the second preset value, the first The candidate intent is used as the final intent; and/or, when the first similarity is less than or equal to the first preset value, prompt information is generated.
  • the processor when the processor executes the computer program to implement the above-mentioned step of extracting the entity information of the text to be processed, it specifically implements the following steps: acquiring a plurality of preset word types, and each preset word type is associated with a first preset word type. Set the intention; get the word search algorithm corresponding to each preset word type, the word search algorithm is used to find the word corresponding to each preset word type; according to the word search algorithm corresponding to each preset word type, extract each preset from the text to be processed The words corresponding to the word type are obtained to obtain multiple first target words of the text to be processed; entity information is generated according to the multiple first target words.
  • the processor executes the computer program to execute the above-mentioned step of obtaining the second candidate intent of the text to be processed according to the entity information
  • the following steps are specifically implemented: obtaining a preset intent set, the preset intent set includes a plurality of The second preset intention, each second preset intention is associated with multiple preset words; multiple first target words in the entity information are obtained; each second preset intention is associated according to the multiple first target words and the preset intention set
  • the target intention is selected from the preset intention set by the preset words of, and the second candidate intention is determined according to the target intention.
  • the processor executes the computer program to execute the above-mentioned step of filtering out the target intentions from the preset intention set based on the plurality of first target words and the preset words associated with each second preset intention in the preset intention set
  • the following steps are specifically implemented: obtaining preset keywords; when the preset keywords are included in the plurality of first target words, matching the preset keywords with the preset words associated with each second preset intention, according to The result of word matching selects the first target sub-candidate intent from the plurality of second preset intents in the preset intent set, and the first target sub-candidate intent is used as the target intent; when the plurality of first target words does not contain the preset key
  • wording the multiple first target words are respectively matched with the preset words associated with each second preset intention, and the second preset intentions in the preset intention set are filtered out according to the result of word matching. Two target sub-candidate intentions, and the second target sub-candidate intention is used as the target
  • the processor executes the computer program to implement the above-mentioned step of obtaining the second degree of similarity between the entity information and the text to be processed
  • the following steps are specifically implemented: obtaining the first target word and the text to be processed corresponding to the target intention When the target intent is multiple, the first sub-similarity is multiple, and the first sub-similarity with the highest similarity among the multiple first sub-similarity is regarded as the second similarity; when the target When the intention is one, the first sub-similarity is regarded as the second similarity; the processor executes the computer program to realize the above-mentioned filtering of the text to be processed in the first candidate intent and the second candidate intent according to the first similarity and the second similarity
  • the following steps are specifically implemented: when the first similarity is greater than or equal to the second similarity, the first candidate intent is taken as the final intent of the text to be processed; when the first similarity is less than the second similarity and When the second candidate intent contains multiple target intents, the
  • the processor when the processor executes the computer program to implement the above-mentioned step of obtaining the second degree of similarity between the entity information and the text to be processed, it specifically implements the following steps: segmenting the text to be processed to obtain multiple texts of the text to be processed The second target word; the first number of the first target word and the second number of the second target word are acquired; the ratio of the first number to the second number is acquired, and the second degree of similarity is determined according to the ratio.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • Obtain the text to be processed input the text to be processed into the text classification model to obtain the similar corpus of the text to be processed output by the text classification model and the first similarity between the similar corpus and the text to be processed.
  • the text classification model is based on the marked intent Corpus training; determine the first candidate intent of the text to be processed based on similar corpus; extract the entity information of the text to be processed, obtain the second candidate intent of the text to be processed based on the entity information; obtain the second degree of similarity between the entity information and the text to be processed ; Filter the final intent of the text to be processed in the first candidate intent and the second candidate intent according to the first degree of similarity and the second degree of similarity.
  • the computer program when the first similarity is greater than the first preset value and less than the second preset value, the computer program is executed by the processor to extract the entity information of the text to be processed, and obtain the information of the text to be processed according to the entity information
  • the first preset value is less than the second preset value
  • the following step when the computer program is executed by the processor, the following step is also implemented: when the first similarity is greater than or equal to the second preset value, the The first candidate intention is used as the final intention; and/or, when the first similarity is less than or equal to the first preset value, prompt information is generated.
  • the following steps are specifically implemented: obtaining a plurality of preset word types, and each preset word type is associated with a first preset word type. Set the intention; obtain the word search algorithm corresponding to each preset word type, the word search algorithm is used to find the word corresponding to each preset word type; according to the word search algorithm corresponding to each preset word type, extract each preset from the text to be processed.
  • the words corresponding to the word types obtain multiple first target words of the text to be processed; and entity information is generated according to the multiple first target words.
  • the following steps are specifically realized: obtaining a preset intent set, which includes a plurality of preset intent sets.
  • the second preset intention each second preset intention is associated with multiple preset words; multiple first target words in the entity information are obtained; each second preset intention is associated according to the multiple first target words and the preset intention set
  • the target intention is selected from the preset intention set by the preset words of, and the second candidate intention is determined according to the target intention.
  • the computer program is executed by the processor to implement the above-mentioned step of filtering out the target intentions from the preset intention set based on the plurality of first target words and the preset words associated with each second preset intention in the preset intention set
  • the following steps are specifically implemented: obtaining preset keywords; when the preset keywords are included in the plurality of first target words, matching the preset keywords with the preset words associated with each second preset intention, according to The result of word matching selects the first target sub-candidate intent from the plurality of second preset intents in the preset intent set, and the first target sub-candidate intent is used as the target intent; when the plurality of first target words does not contain the preset key
  • wording the multiple first target words are respectively matched with the preset words associated with each second preset intention, and the second preset intentions in the preset intention set are filtered out according to the result of word matching. Two target sub-candidate intentions, and the second target sub-candidate intention is used as the
  • the computer program when the computer program is executed by the processor to achieve the above-mentioned step of obtaining the second degree of similarity between the entity information and the text to be processed, the following steps are specifically implemented: obtaining the first target word and the text to be processed corresponding to the target intention When the target intent is multiple, the first sub-similarity is multiple, and the first sub-similarity with the highest similarity among the multiple first sub-similarity is regarded as the second similarity; when the target When the intention is one, the first sub-similarity is regarded as the second similarity; the computer program is executed by the processor to realize the above-mentioned selection of the first candidate intent and the second candidate intent to be processed according to the first similarity and the second similarity In the final intent of the text, the following steps are specifically implemented: when the first similarity is greater than or equal to the second similarity, the first candidate intent is taken as the final intent of the text to be processed; when the first similarity is less than the second similarity And when the second candidate intent contains multiple target intents, the
  • the following steps are specifically implemented: word segmentation is performed on the text to be processed, and the number of the text to be processed is obtained.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种文本意图识别方法、装置、计算机设备和存储介质,所述方法包括:获取待处理文本(S102);将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度,文本分类模型根据已标注意图的语料进行训练(S104);根据相似语料确定待处理文本的第一候选意图(S106);提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图(S108);获取实体信息与待处理文本的第二相似度(S110);根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图(S112)。上述方法能够提高文本内容意图识别的准确度。

Description

文本意图识别方法、装置、计算机设备和存储介质 技术领域
本申请涉及文本处理技术领域,特别是涉及一种文本意图识别方法、装置、计算机设备和存储介质。
背景技术
一般而言,根据文本内容可以确定文本内容对应的意图。对于文本内容的意图识别,一般使用分类的方法将句子分为相应的意图种类。NLU(Natural Language Processing,自然语音理解)主要负责提取文本内容中需要理解的内容。在NLU领域,传统的方式为采用一种算法提取文本内容的意图。具体地,采用统一格式的标注语料输入到一种算法中,通过比较算法输出的置信度或分类结果的方式确定文本内容的意图。然而,在具体的开发过程中往往存在标注数据不足的问题。也即是,标注语料不足时,使用标注语料对用于确定文本内容意图的算法进行训练,最终根据训练后的算法确定文本内容意图时,由于标注语料不足将导致最终识别到的文本内容意图准确性较低。
发明内容
基于此,有必要针对上述技术问题,提供一种能够提高文本内容意图识别的准确度的文本意图识别方法、装置、计算机设备和存储介质。
一种文本意图识别方法,该方法包括:获取待处理文本;将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度,文本分类模型根据已标注意图的语料进行训练;根据相似语料确定待处理文本的第一候选意图;提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图;获取实体信息与待处理文本的第二相似度;根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图。
在其中一个实施例中,在第一相似度大于第一预设值且小于第二预设值时,进入提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图的步骤,第一预设值小于第二预设值;文本意图识别方法还包括:在第一相似度大于或等于第二预设值时,将第一候选意图作为最终意图;和/或,在第一相似度小于或等于第一预设值时,生成提示信息。
在其中一个实施例中,提取待处理文本的实体信息,包括:获取多个预设词语类型,各预设词语类型关联有第一预设意图;获取各预设词语类型对应的单词查找算法,单词查找算法用于查找各预设词语类型对应的词语;根据各预设词语类型对应的单词查找算法从待处理文本中提取各预设词语类型对应的词语,得到待处理文本的多个第一目标词语;根据多个第一目标词语生成实体信息。
在其中一个实施例中,根据实体信息获取待处理文本的第二候选意图,包括:获取预设意图集,预设意图集包括多个第二预设意图,各第二预设意图关联多个预设词语;获取实体信息中的多个第一目标词语;根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图,根据目标意图确定第二候选意图。
在其中一个实施例中,根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图,包括:获取预设关键词;当多个第一目标词语中包含预设关键词时,将预设关键词与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图,第一目标子候选意图作为目标意图;当多个第一目标词语中未包含预设关键词时,将所多个第一目标词语分别与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第二目标子候选意图,第二目标子候选意图作为目标意图。
在其中一个实施例中,获取实体信息与待处理文本的第二相似度,包括:获取目标意图对应的第一目标词语与待处理文本的第一子相似度;当目标意图为多个时,第一子相似度为多个,将多个第一子相似度中相似度最高的第一子相似度作为第二相似度;当目标意图为一个时,将第一子相似度作为第二相似度;根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图,包括:当第一相似度大于或等于第二相似度时,将第一候选意图作为待处理文本的最终意图;当第一相似度小于第二相似度且第二候选意图中包含多个目标意图时,将第二相似度对应的目标意图作为待处理文本的最终意图;当第一相似度小于第二相似度且第二候选意图中包含一个目标意图时,将第二候选意图中的目标意图作为待处理文本的最终意图。
在其中一个实施例中,获取实体信息与待处理文本的第二相似度,包括:对待处理文本进行切词,得到待处理文本的多个第二目标词语;获取第一目标词语的第一数量以及第二目标词语的第二数量;获取第一数量与第二数量的比值,根据比值确定第 二相似度。
一种文本意图识别装置,该装置包括:第一获取模块,用于获取待处理文本;第二获取模块,用于将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度,文本分类模型根据已标注意图的语料进行训练;第一确定模块,用于根据相似语料确定待处理文本的第一候选意图;第三获取模块,用于提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图;第四获取模块,用于获取实体信息与待处理文本的第二相似度;第二确定模块,用于根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图。
一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述任一实施例方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述任一实施例方法的步骤。
上述文本意图识别方法、装置、计算机设备和存储介质,首先将待处理文本输入文本分类模型,得到相似语料和相似语料与待处理文本之间的第一相似度。同时,根据相似语料确定待处理文本的第一意图。再者,提取待处理文本的实体信息,根据实体信息获取待处理文本的第二意图。同时,获取实体信息与待处理文本的第二相似度。最终,根据第一相似度和第二相似度确定待处理文本的意图,待处理文本的意图为第一意图或第二意图。因此,通过文本分类模型以及待处理文本的实体信息分别确定待处理文本的第一意图和第二意图,并根据文本分类模型以及待处理文本两者的相似度确定待处理文本的最终意图为第一意图或第二意图,从而能够采取多种方式识别待处理文本的意图,避免了采取单一文本分类模型确定待处理文本的意图时由于标注语料不足导致待处理文本的意图识别的准确度较低,提高了文本内容意图识别的准确度。
附图说明
图1为一个实施例中一种文本意图识别方法的应用环境图;
图2为一个实施例中一种文本意图识别方法的流程示意图;
图3为另一个实施例中一种文本意图识别方法的流程示意图;
图4为一个实施例中S108的流程示意图;
图5为另一个实施例中S108的流程示意图;
图6为一个实施例中S1085的流程示意图;
图7为一个实施例中S110的流程示意图;
图8为一个实施例中一种文本意图识别装置的结构框图;
图9为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的一种文本意图识别方法,应用于如图1所示的应用环境中。用户可以通过终端上的各种应用与对应服务平台进行数据交互。尤其是用户可以通过终端上的应用向对应的服务平台发送问答类型的文本,以接收服务平台下发的回复信息。其中,客户服务器为支持服务平台的服务器。服务平台通过客户服务器接收用户发送的问答类型的文本,即接收到待处理文本。进一步地,将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度。同时,根据相似语料确定待处理文本的第一候选意图。此外,服务平台提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图,获取实体信息与待处理文本的第二相似度。最终,根据第一相似度和第二相似度,在第一候选意图和第二候选意图中筛选待处理文本的最终意图。最终意图即为用户发送的问答类型的文本对应的意图。进而,服务平台根据得到的意图读取对应的回复答案,将回复答案下发给用户的终端。其中,这里的终端可以是诸如计算机、平板电脑、智能手机之类的硬件设备。客户服务器可以是由单个服务器或者多个服务器构成的服务器集群实现。
在一个实施例中,如图2所示,提供了一种文本意图识别方法,以该方法应用于图1中的服务平台(具体为支持服务平台的客户服务器)为例进行说明,包括以下步骤:
S102,获取待处理文本。
在本实施例中,用户通过终端向服务平台发送问答类型的文本信息。服务平台接收到用户发送的问答类型的文本信息,将该文本信息作为待处理文本。其中,待处理文本用于表征用户意图,可通过对待处理文本进行意图识别以获得用户意图。例如,待处理文本可以是“已经提交退货申请了”、“我买的手机坏了”、“我的货物到哪了”等表示用户咨询的意图的文本。
S104,将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度,文本分类模型根据已标注意图的语料进行训练。
在本实施例中,服务平台得到待处理文本之后,将待处理文本输入文本分类模型。文本分类模型已采用已标注意图的语料进行训练。文本分类模型用于根据已标注意图的语料对待处理文本进行识别,输出与待处理文本相似的候选相似语料以及候选相似语料与待处理文本之间的相似度。候选相似语料可以是一个或多个。对应地,候选相似语料与待处理文本之间的相似度也可以是一个或多个。当候选相似语料为多个时,选取最高相似度的候选相似语料为待处理文本的相似语料,最高相似度即为相似语料与待处理文本之间的第一相似度。其中,文本分类模型可以是Text-CNN模型(文本卷积模型)。在对文本分类模型进行训练时,可以将句维度标注的问答语料(已标注意图的语料)进行去停用词等操作后进行模型训练。例如,去掉吗、啦、呢等语气助词之类的无用词。同时,在将待处理文本输入到已训练的文本分类模型之前,将待处理文本进行去停用词操作,进而再将去停用词后的待处理文本输入已训练的文本分类模型,以得到待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度。因此,可以提高服务平台的处理效率。
S106,根据相似语料确定待处理文本的第一候选意图。
在本实施例中,当服务平台根据文本分类模型确定待处理文本的相似语料时,获取相似语料对应的用户意图,将该用户意图作为待处理文本的第一候选意图。具体地,服务平台中存储有多条已标注意图的语料,利用已标注意图的语料对文本分类模型进行训练后,将待处理文本输入文本分类模型时输出的相似语料已标注意图,根据已标注意图可以确定待处理文本的第一候选意图。还可以是,服务平台获取到相似语料后,根据相似语料获取对应的标准语料,进而根据标准语料确定待处理文本的第一候选意图。其中,标准语料已标注意图。根据标准问可以确定待处理文本的第一候选意图。
举例说明:服务平台中存储已有句维度标注(已标注意图)的问答语料,例如售后类型语料。对于售后类型语料,其标准问(句维度标注的问答语料)和相似问如下所示:
Figure PCTCN2020097006-appb-000001
Figure PCTCN2020097006-appb-000002
其中,intent字段对应标准问,text字段对应相似问。标准问与相似问之间是一对多的对应关系。用户在获取到相似度最高的相似问后,通过查找对应的标准问的答案的方式来获取最终结果。
S108,提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图。
在本实施例中,服务平台提取待处理文本的实体信息。实体信息可以是待处理文本中的分词构成的信息。例如,实体信息中包括品类词、品牌词、热词以及关键词等。实体信息还可以是根据待处理文本的文本内容确定出的实体信息。例如,根据待处理文本的文本内容确定出待处理文本的语义,将待处理文本的语义作为实体信息。
进一步地,服务平台根据实体信息获取待处理文本的第二候选意图。具体地,服务平台中包含有多种预设意图,各预设意图对应有关联信息。根据实体信息与各预设意图的关联信息的匹配关系可以确定出处理文本的第二候选意图。
S110,获取实体信息与待处理文本的第二相似度。
在本实施例中,第二相似度可以是实体信息与待处理文本的语义的相似度。当实体信息由待处理文本中提取的一个或多个分词构成时,第二相似度还可以根据一个或多个分词与待处理文本之间的比例确定。第二相似度表征了实体信息与待处理文本之间的相似程度。
S112,根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图。
在本实施例中,服务平台根据文本分类模型确定出待处理文本的第一相似度以及第一候选意图,以及根据待处理文本的实体信息确定待处理文本的第二相似度以及第二候选意图,进而根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图。当第一相似度大于或等于第二相似度时,最终意图为第一 候选意图。当第一相似度小于第二相似度时,最终意图为第二候选意图。因此,通过对两种方式得到的相似度进行比较,将最大相似度对应的候选意图作为待处理文本的最终意图,使得最终确定的待处理文本的意图更加准确,避免了通过单一方式确定待处理文本的意图导致的意图识别的准确性低。
上述文本意图识别方法,首先将待处理文本输入文本分类模型,得到相似语料和相似语料与待处理文本之间的第一相似度。同时,根据相似语料确定待处理文本的第一意图。再者,提取待处理文本的实体信息,根据实体信息获取待处理文本的第二意图。同时,获取实体信息与待处理文本的第二相似度。最终,根据第一相似度和第二相似度确定待处理文本的意图,待处理文本的意图为第一意图或第二意图。因此,通过文本分类模型以及待处理文本的实体信息分别确定待处理文本的第一意图和第二意图,并根据文本分类模型以及待处理文本两者的相似度确定待处理文本的最终意图为第一意图或第二意图,从而能够采取多种方式识别待处理文本的意图,避免了采取单一文本分类模型确定待处理文本的意图时由于标注语料不足导致待处理文本的意图识别的准确度较低,提高了文本内容意图识别的准确度。
在一个实施例中,如图3所示,在进入步骤S108之前,服务平台设置了前置条件。前置条件为第一相似度大于第一预设值且小于第二预设值。其中,第一预设值小于第二预设值。在第一相似度大于第一预设值且小于第二预设值时,进入步骤S108。当不满足前置条件时,分两种情况。情况一,参见步骤S1074:在第一相似度大于或等于第二预设值时,将第一候选意图作为最终意图。情况二,参见步骤S1072:在第一相似度小于或等于第一预设值时,生成提示信息。
具体地,将去停用词后的待处理文本,使用已训练的文本分类模型进行分类识别后,获得模型输出的候选相似语料及候选相似语料与待处理文本之间的相似度。其中,候选相似语料为多条,候选相似语料与待处理文本之间的相似度也为多个,并根据相似度的大小对候选相似语料进行排序。进一步地,服务平台获取相似度最高的候选相似语料,若该相似度最高的候选相似语料对应相似度大于或等于第二预设值(如95%)时,直接根据该候选相似语料对应的意图作为第一候选意图,此时程序终止,无需再执行步骤S108。若该相似度最高的候选相似语料对应相似度大于第一预设值(如60%)且小于第二预设值时,执行步骤S108。若该相似度最高的候选相似语料对应相似度小于或等于第一预设值时,生成提示信息,此时也无需再执行步骤S108。因此,可以提高服务平台对待处理文本的意图识别能力。
在一个实施例中,如图4所示,步骤S108包括:
S1082,获取多个预设词语类型,各预设词语类型关联有第一预设意图。
S1084,获取各预设词语类型对应的单词查找算法,单词查找算法用于查找各预设词语类型对应的词语。
S1086,根据各预设词语类型对应的单词查找算法从待处理文本中提取各预设词语类型对应的词语,得到待处理文本的多个第一目标词语。
S1088,根据多个第一目标词语生成实体信息。
在该实施例中,服务平台中预先设置多个预设词语类型,每个预设词语类型关联有对应的第一预设意图。例如,多个预设词语类型包括品类词、热词、品牌词和关键词。品类词对应有一个或多个第一预设意图,热词对应有一个或多个第一预设意图,品牌词对应有一个或多个第一预设意图,关键词对应有一个或多个第一预设意图。此外,各预设词语类型对应的单词查找算法,用于查找各预设词语类型对应的词语。服务平台根据各预设词语类型对应的单词查找算法从待处理文本中提取各预设词语类型对应的词语,得到待处理文本的多个第一目标词语。其中,各预设词语类型对应的单词查找算法可以为同一单词查找算法。单词查找算法可以是字典树查找算法。最终,根据多个第一目标词语生成实体信息。实体信息可以包括多个第一目标词语,也可以是根据多个第一目标词语生成的不包括第一目标词语的其他信息。因此,可以提高服务平台提取待处理文本的实体信息能力。例如,在具体生成实体信息过程中,对待处理文本进行分词并对分词后的结果使用词维度语料进行NER(命名对象实体)拾取,获取待处理文本中的实体信息。实体信息可以包括品类、品牌、热词、关键词等。
在一个实施例中,如图5所示,步骤S108还包括:
S1081,获取预设意图集,预设意图集包括多个第二预设意图,各第二预设意图关联多个预设词语。
S1083,获取实体信息中的多个第一目标词语。
S1085,根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图,根据目标意图确定第二候选意图。
在该实施例中,服务平台预先设置有预设意图集。预设意图集中包括多个第二预设意图,各第二预设意图关联多个预设词语。例如,第二预设意图为购买意图时,其关联的预设词语可以包括“买”、“购”和“售”等。第二预设意图为售后意图时,其关联的预设词语可以包括“卖”和“坏了”等。通过预设词语与第二预设意图的关 联关系,可以根据多个第一目标词语以及各第二预设意图关联的预设词语从预设意图集中筛选出目标意图。目标意图可以为一个或多个。根据目标意图,服务平台能够确定第二候选意图。因此,通过实体信息中的多个第一目标词语从预设意图集中筛选出目标意图,进而根据目标意图确定第二候选意图,使得服务平台能够快速获取到第二候选意图。
在一个实施例中,如图6所示,步骤S1085包括:
S10852,获取预设关键词。
S10854,当多个第一目标词语中包含预设关键词时,将预设关键词与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图,第一目标子候选意图作为目标意图。
S10856,当多个第一目标词语中未包含预设关键词时,将多个第一目标词语分别与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第二目标子候选意图,第二目标子候选意图作为目标意图。
在该实施例中,服务平台设置有预设关键词。预设关键词可以是根据当前活动意图设置,或者根据系统能够识别的用户意图设置。根据预设关键词可以直接识别出用户意图。进一步地,从待处理文本中提取多个第一目标词语,将预设关键词与多个第一目标词语进行匹配识别,判断多个第一目标词语中是否包含有预设关键词。若有,则将预设关键词与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图,第一目标子候选意图作为目标意图。因此,无需将所有的第一目标词语与各第二预设意图关联的预设词语进行词语匹配,进而省去服务平台的一些计算工作,提高服务平台对待处理文本的意图识别的效率。若无,将多个第一目标词语分别与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第二目标子候选意图,第二目标子候选意图作为目标意图。其中,在第一目标词语与第二预设意图关联的预设词语进行词语匹配以筛选出第二目标子候选意图时,第一目标词语可以对应一个或多个第二目标子候选意图。
针对根据预设关键词获取目标意图,以下给出一个具体实施场景:
采用过滤关键词的方法,对待处理文本的意图进行初筛。具体地,获取客服系统支持的意图列表的子集,假设当前系统支持:售后、导购、活动查询、优惠券查询四 个意图。通过过滤待处理文本“我买的手机坏了”以得到关键词类型NER(命名对象实体)的方式,分别获取到“买”和“坏了”两个关键词,进而分别得到对应导购和售后两个意图,这样在后续的余弦相似度计算中就可以只比较导购和售后两个意图的相似度即可,省去一些额外的计算工作。
在一个实施例中,还可以是,当多个第一目标词语中包含预设剔除词时,从多个第一目标词语中剔除预设剔除词,得到多个对象词语。将对象词语与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图,第一目标子候选意图作为目标意图。
在该实施例中,服务平台可以预先设置多个预设剔除词,用于对多个第一目标词语进行词语筛选。当多个第一目标词语中包含预设剔除词时,将多个第一目标词语中的预设剔除词剔除,剩下的第一目标词语用于与各第二预设意图关联的预设词语进行词语匹配,最终根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图。
在一个实施例中,如图7所示,步骤S110包括:
S1102,获取目标意图对应的第一目标词语与待处理文本的第一子相似度。
S1104,当目标意图为多个时,第一子相似度为多个,将多个第一子相似度中相似度最高的第一子相似度作为第二相似度。
S1106,当目标意图为一个时,将第一子相似度作为第二相似度。
此时,步骤S112包括:
S1122,当第一相似度大于或等于第二相似度时,将第一候选意图作为待处理文本的最终意图。
S1124,当第一相似度小于第二相似度且第二候选意图中包含多个目标意图时,将第二相似度对应的目标意图作为待处理文本的最终意图。
S1126,当第一相似度小于第二相似度且第二候选意图中包含一个目标意图时,将第二候选意图中的目标意图作为待处理文本的最终意图。
在该实施例中,当根据多个第一目标词语确定出的目标意图为多个时,多个目标意图对应的第一子相似度也为多个。此时,将多个第一子相似度中相似度最高的第一子相似度作为第二相似度。此时第二相似度对应的目标意图作为第二候选意图。当根据多个第一目标词语确定出的目标意图为1个时,此时无需再做筛选,直接将该目标意图对应的第一子相似度作为第二相似度,该目标意图也即为第二候选意图。因此, 在步骤S112中筛选待处理文本的最终意图时,若第一相似度大于或等于第二相似度,此时直接将第一候选意图作为待处理文本的最终意图。若第一相似度小于第二相似度且第二候选意图中包含多个目标意图时,将第二相似度对应的目标意图作为待处理文本的最终意图。当第一相似度小于第二相似度且第二候选意图中包含一个目标意图时,将第二候选意图中的目标意图作为待处理文本的最终意图。因此,服务平台可以提供多种情况下的待处理文本的意图识别的途径,提高待处理文本的意图识别能力。
在一个实施例中,步骤S110包括:对待处理文本进行切词,得到待处理文本的多个第二目标词语;获取第一目标词语的第一数量以及第二目标词语的第二数量;获取第一数量与第二数量的比值,根据比值确定第二相似度。
在该实施例中,获取实体信息与待处理文本之间的第二相似度时,将待处理文本进行切词处理,得到多个第二分词。进一步地,获取待处理文本的第二分词的第二数量,以及获取实体信息中第一分词的第一数量,获取第一数量与第二数量的比值。将该比值作为第二相似度。例如,待处理文本为“我买的手机坏了”,实体信息为“买”和“坏了”,那两者的相似度为(2/5)*100%=40%。
针对上述各实施例所述的文本意图识别方法,以下提供一个具体实施例,以“我买的手机坏了”这条待处理文本为例。
首先,对句维度标注的售后类型语料做Text-CNN模型深度训练,并存储对应的数据模型,其中售后类型语料中包括“我刚买的空调坏了”这条语料。
其次,根据系统中词维度标注结果,将不同类型的词使用TriTree(字典树)算法进行训练,分别保存对应的模型。其中包括如买、坏了、活动等意图倾向特别明显的关键词,以及手机、电话、冰箱、空调等对应的品类词,以用于对待处理文本的NER拾取。
再者,设计对应的意图的相似度算法。例如购买意图的相似度算法,可以通过计算词向量余弦相似度的方式,将“我买的手机坏了”转换为词向量的格式:“买(关键词)”“手机(品类词)”与去停用词后的待处理文本的词向量“我”“刚买”“手机”“坏了”做比较,可以得到该待处理文本在购买意图下的相似度为53%。
最后,将待处理文本使用句维度标注语料得到的Text-CNN模型进行预测,得到售后意图的相似度为80%,因此可得到该待处理文本的意图为售后意图,相似问为:我刚买的空调坏了,该相似问对应的知识点为:售后维保。
通过两者的相似度比较,得到待处理文本的意图为售后维保。
因此,本申请解决了在句维度标注语料不足的情况下,将词维度标注语料与句维度标注语料同时发挥作用下获取用户最终意图的难题,从而避免了句维度标注语料不足时导致用户意图识别的准确性低的问题。
应该理解的是,虽然流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,附图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本申请还提供一种文本意图识别装置,如图8所示,该装置包括第一获取模块10、第二获取模块20、第一确定模块30、第三获取模块40、第四获取模块50以及第二确定模块60。
第一获取模块10,用于获取待处理文本;第二获取模块20,用于将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度,文本分类模型根据已标注意图的语料进行训练;第一确定模块30,用于根据相似语料确定待处理文本的第一候选意图;第三获取模块40,用于提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图;第四获取模块50,用于获取实体信息与待处理文本的第二相似度;第二确定模块60,用于根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图。
在其中一个实施例中,在第一相似度大于第一预设值且小于第二预设值时,实现第三获取模块40的提取操作,其中第一预设值小于第二预设值;文本意图识别装置还包括(图8未示出):第三确定模块,用于在第一相似度大于或等于第二预设值时,将第一候选意图作为最终意图;和/或,提示模块,用于在第一相似度小于或等于第一预设值时,生成提示信息。
在其中一个实施例中,第三获取模块40包括(图8未示出):第一获取单元,用于获取多个预设词语类型,各预设词语类型关联有第一预设意图;第二获取单元,用于获取各预设词语类型对应的单词查找算法,单词查找算法用于查找各预设词语类型对应的词语;提取单元,用于根据各预设词语类型对应的单词查找算法从待处理文本 中提取各预设词语类型对应的词语,得到待处理文本的多个第一目标词语;生成单元,用于根据多个第一目标词语生成实体信息。
在其中一个实施例中,第三获取模块40包括(图8未示出):第三获取单元,用于获取预设意图集,预设意图集包括多个第二预设意图,各第二预设意图关联多个预设词语;第四获取单元,用于获取实体信息中的多个第一目标词语;筛选单元,用于根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图,根据目标意图确定第二候选意图。
在其中一个实施例中,筛选单元包括:第一获取子单元,用于获取预设关键词;第一筛选子单元,用于当多个第一目标词语中包含预设关键词时,将预设关键词与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图,第一目标子候选意图作为目标意图;第二筛选子单元,用于当多个第一目标词语中未包含预设关键词时,将多个第一目标词语分别与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第二目标子候选意图,第二目标子候选意图作为目标意图。
在其中一个实施例中,第四获取模块50包括(图8未示出):第五获取单元,用于获取目标意图对应的第一目标词语与待处理文本的第一子相似度;第一确定单元,用于当目标意图为多个时,第一子相似度为多个,将多个第一子相似度中相似度最高的第一子相似度作为第二相似度;第二确定单元,用于当目标意图为一个时,将第一子相似度作为第二相似度;第二确定模块60包括:第三确定单元,用于当第一相似度大于或等于第二相似度时,将第一候选意图作为待处理文本的最终意图;第四确定单元,用于当第一相似度小于第二相似度且第二候选意图中包含多个目标意图时,将第二相似度对应的目标意图作为待处理文本的最终意图;第五确定单元,用于当第一相似度小于第二相似度且第二候选意图中包含一个目标意图时,将第二候选意图中的目标意图作为待处理文本的最终意图。
在其中一个实施例中,第四获取模块50包括(图8未示出):切词单元,用于对待处理文本进行切词,得到待处理文本的多个第二目标词语;第六获取单元,用于获取第一目标词语的第一数量以及第二目标词语的第二数量;第六确定单元,用于获取第一数量与第二数量的比值,根据比值确定第二相似度。
关于文本意图识别装置的具体限定可以参见上文中对于文本意图识别方法的限 定,在此不再赘述。上述文本意图识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是支持服务平台运行的客户服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端连接,以读取终端上的待处理文本。该计算机程序被处理器执行时以实现一种界面元素定位方法。
本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现以下步骤:
获取待处理文本;将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度,文本分类模型根据已标注意图的语料进行训练;根据相似语料确定待处理文本的第一候选意图;提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图;获取实体信息与待处理文本的第二相似度;根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图。
在其中一个实施例中,在第一相似度大于第一预设值且小于第二预设值时,处理器执行计算机程序实现提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图的步骤,第一预设值小于第二预设值;此时,处理器执行计算机程序时还实现以下步骤:在第一相似度大于或等于第二预设值时,将第一候选意图作为最终意图;和/或,在第一相似度小于或等于第一预设值时,生成提示信息。
在其中一个实施例中,处理器执行计算机程序执行实现上述的提取待处理文本的 实体信息的步骤时,具体实现以下步骤:获取多个预设词语类型,各预设词语类型关联有第一预设意图;获取各预设词语类型对应的单词查找算法,单词查找算法用于查找各预设词语类型对应的词语;根据各预设词语类型对应的单词查找算法从待处理文本中提取各预设词语类型对应的词语,得到待处理文本的多个第一目标词语;根据多个第一目标词语生成实体信息。
在其中一个实施例中,处理器执行计算机程序执行实现上述的根据实体信息获取待处理文本的第二候选意图的步骤时,具体实现以下步骤:获取预设意图集,预设意图集包括多个第二预设意图,各第二预设意图关联多个预设词语;获取实体信息中的多个第一目标词语;根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图,根据目标意图确定第二候选意图。
在其中一个实施例中,处理器执行计算机程序执行实现上述的根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图的步骤时,具体实现以下步骤:获取预设关键词;当多个第一目标词语中包含预设关键词时,将预设关键词与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图,第一目标子候选意图作为目标意图;当多个第一目标词语中未包含预设关键词时,将所多个第一目标词语分别与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第二目标子候选意图,第二目标子候选意图作为目标意图。
在其中一个实施例中,处理器执行计算机程序执行实现上述的获取实体信息与待处理文本的第二相似度的步骤时,具体实现以下步骤:获取目标意图对应的第一目标词语与待处理文本的第一子相似度;当目标意图为多个时,第一子相似度为多个,将多个第一子相似度中相似度最高的第一子相似度作为第二相似度;当目标意图为一个时,将第一子相似度作为第二相似度;处理器执行计算机程序实现上述的根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图的步骤时,具体实现以下步骤:当第一相似度大于或等于第二相似度时,将第一候选意图作为待处理文本的最终意图;当第一相似度小于第二相似度且第二候选意图中包含多个目标意图时,将第二相似度对应的目标意图作为待处理文本的最终意图;当第一相似度小于第二相似度且第二候选意图中包含一个目标意图时,将第二候选意图中的目标意图作为待处理文本的最终意图。
在其中一个实施例中,处理器执行计算机程序实现上述的获取实体信息与待处理文本的第二相似度的步骤时,具体实现以下步骤:对待处理文本进行切词,得到待处理文本的多个第二目标词语;获取第一目标词语的第一数量以及第二目标词语的第二数量;获取第一数量与第二数量的比值,根据比值确定第二相似度。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
获取待处理文本;将待处理文本输入文本分类模型,得到文本分类模型输出的待处理文本的相似语料以及相似语料与待处理文本之间的第一相似度,文本分类模型根据已标注意图的语料进行训练;根据相似语料确定待处理文本的第一候选意图;提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图;获取实体信息与待处理文本的第二相似度;根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图。
在其中一个实施例中,在第一相似度大于第一预设值且小于第二预设值时,计算机程序被处理器执行实现提取待处理文本的实体信息,根据实体信息获取待处理文本的第二候选意图的步骤,第一预设值小于第二预设值;此时,计算机程序被处理器执行时还实现以下步骤:在第一相似度大于或等于第二预设值时,将第一候选意图作为最终意图;和/或,在第一相似度小于或等于第一预设值时,生成提示信息。
在其中一个实施例中,计算机程序被处理器执行实现上述的提取待处理文本的实体信息的步骤时,具体实现以下步骤:获取多个预设词语类型,各预设词语类型关联有第一预设意图;获取各预设词语类型对应的单词查找算法,单词查找算法用于查找各预设词语类型对应的词语;根据各预设词语类型对应的单词查找算法从待处理文本中提取各预设词语类型对应的词语,得到待处理文本的多个第一目标词语;根据多个第一目标词语生成实体信息。
在其中一个实施例中,计算机程序被处理器执行实现上述的根据实体信息获取待处理文本的第二候选意图的步骤时,具体实现以下步骤:获取预设意图集,预设意图集包括多个第二预设意图,各第二预设意图关联多个预设词语;获取实体信息中的多个第一目标词语;根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图,根据目标意图确定第二候选意图。
在其中一个实施例中,计算机程序被处理器执行实现上述的根据多个第一目标词语以及预设意图集中各第二预设意图关联的预设词语从预设意图集中筛选出目标意图 的步骤时,具体实现以下步骤:获取预设关键词;当多个第一目标词语中包含预设关键词时,将预设关键词与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第一目标子候选意图,第一目标子候选意图作为目标意图;当多个第一目标词语中未包含预设关键词时,将所多个第一目标词语分别与各第二预设意图关联的预设词语进行词语匹配,根据词语匹配的结果从预设意图集的多个第二预设意图中筛选出第二目标子候选意图,第二目标子候选意图作为目标意图。
在其中一个实施例中,计算机程序被处理器执行实现上述的获取实体信息与待处理文本的第二相似度的步骤时,具体实现以下步骤:获取目标意图对应的第一目标词语与待处理文本的第一子相似度;当目标意图为多个时,第一子相似度为多个,将多个第一子相似度中相似度最高的第一子相似度作为第二相似度;当目标意图为一个时,将第一子相似度作为第二相似度;计算机程序被处理器执行实现上述的根据第一相似度和第二相似度在第一候选意图和第二候选意图中筛选待处理文本的最终意图的步骤时,具体实现以下步骤:当第一相似度大于或等于第二相似度时,将第一候选意图作为待处理文本的最终意图;当第一相似度小于第二相似度且第二候选意图中包含多个目标意图时,将第二相似度对应的目标意图作为待处理文本的最终意图;当第一相似度小于第二相似度且第二候选意图中包含一个目标意图时,将第二候选意图中的目标意图作为待处理文本的最终意图。
在其中一个实施例中,计算机程序被处理器执行实现上述的获取实体信息与待处理文本的第二相似度的步骤时,具体实现以下步骤:对待处理文本进行切词,得到待处理文本的多个第二目标词语;获取第一目标词语的第一数量以及第二目标词语的第二数量;获取第一数量与第二数量的比值,根据比值确定第二相似度。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、 动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种文本意图识别方法,所述方法包括:
    获取待处理文本;
    将所述待处理文本输入文本分类模型,得到所述文本分类模型输出的所述待处理文本的相似语料以及所述相似语料与所述待处理文本之间的第一相似度,所述文本分类模型根据已标注意图的语料进行训练;
    根据所述相似语料确定所述待处理文本的第一候选意图;
    提取所述待处理文本的实体信息,根据所述实体信息获取所述待处理文本的第二候选意图;
    获取所述实体信息与所述待处理文本的第二相似度;
    根据所述第一相似度和所述第二相似度在所述第一候选意图和所述第二候选意图中筛选所述待处理文本的最终意图。
  2. 根据权利要求1所述的方法,其特征在于,在所述第一相似度大于第一预设值且小于第二预设值时,进入所述提取所述待处理文本的实体信息,根据所述实体信息获取所述待处理文本的第二候选意图的步骤,所述第一预设值小于所述第二预设值;
    所述方法还包括:
    在所述第一相似度大于或等于所述第二预设值时,将所述第一候选意图作为所述最终意图;
    和/或,在所述第一相似度小于或等于所述第一预设值时,生成提示信息。
  3. 根据权利要求1所述的方法,其特征在于,所述提取所述待处理文本的实体信息,包括:
    获取多个预设词语类型,各预设词语类型关联有第一预设意图;
    获取所述各预设词语类型对应的单词查找算法,所述单词查找算法用于查找所述各预设词语类型对应的词语;
    根据所述各预设词语类型对应的单词查找算法从所述待处理文本中提取所述各预设词语类型对应的词语,得到所述待处理文本的多个第一目标词语;
    根据所述多个第一目标词语生成所述实体信息。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述实体信息获取所述待处理文本的第二候选意图,包括:
    获取预设意图集,所述预设意图集包括多个第二预设意图,各第二预设意图关联 多个预设词语;
    获取所述实体信息中的所述多个第一目标词语;
    根据所述多个第一目标词语以及所述预设意图集中各第二预设意图关联的预设词语从所述预设意图集中筛选出目标意图,根据所述目标意图确定所述第二候选意图。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述多个第一目标词语以及所述预设意图集中各第二预设意图关联的预设词语从所述预设意图集中筛选出目标意图,包括:
    获取预设关键词;
    当所述多个第一目标词语中包含所述预设关键词时,将所述预设关键词与各第二预设意图关联的预设词语进行词语匹配,根据所述词语匹配的结果从所述预设意图集的多个第二预设意图中筛选出第一目标子候选意图,所述第一目标子候选意图作为所述目标意图;
    当所述多个第一目标词语中未包含所述预设关键词时,将所述多个第一目标词语分别与各第二预设意图关联的预设词语进行词语匹配,根据所述词语匹配的结果从所述预设意图集的多个第二预设意图中筛选出第二目标子候选意图,所述第二目标子候选意图作为所述目标意图。
  6. 根据权利要求5所述的方法,其特征在于,所述获取所述实体信息与所述待处理文本的第二相似度,包括:
    获取所述目标意图对应的第一目标词语与所述待处理文本的第一子相似度;
    当所述目标意图为多个时,所述第一子相似度为多个,将多个所述第一子相似度中相似度最高的第一子相似度作为所述第二相似度;
    当所述目标意图为一个时,将所述第一子相似度作为所述第二相似度;
    所述根据所述第一相似度和所述第二相似度在所述第一候选意图和所述第二候选意图中筛选所述待处理文本的最终意图,包括:
    当所述第一相似度大于或等于第二相似度时,将所述第一候选意图作为所述待处理文本的最终意图;
    当所述第一相似度小于第二相似度且所述第二候选意图中包含多个目标意图时,将所述第二相似度对应的目标意图作为所述待处理文本的最终意图;
    当所述第一相似度小于第二相似度且所述第二候选意图中包含一个目标意图时,将所述第二候选意图中的目标意图作为所述待处理文本的最终意图。
  7. 根据权利要求4所述的方法,其特征在于,所述获取所述实体信息与所述待处理文本的第二相似度,包括:
    对所述待处理文本进行切词,得到所述待处理文本的多个第二目标词语;
    获取所述第一目标词语的第一数量以及所述第二目标词语的第二数量;
    获取所述第一数量与所述第二数量的比值,根据所述比值确定所述第二相似度。
  8. 一种文本意图识别装置,其特征在于,所述装置包括:
    第一获取模块,用于获取待处理文本;
    第二获取模块,用于将所述待处理文本输入文本分类模型,得到所述文本分类模型输出的所述待处理文本的相似语料以及所述相似语料与所述待处理文本之间的第一相似度,所述文本分类模型根据已标注意图的语料进行训练;
    第一确定模块,用于根据所述相似语料确定所述待处理文本的第一候选意图;
    第三获取模块,用于提取所述待处理文本的实体信息,根据所述实体信息获取所述待处理文本的第二候选意图;
    第四获取模块,用于获取所述实体信息与所述待处理文本的第二相似度;
    第二确定模块,用于根据所述第一相似度和所述第二相似度在所述第一候选意图和所述第二候选意图中筛选所述待处理文本的最终意图。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。
PCT/CN2020/097006 2020-03-05 2020-06-19 文本意图识别方法、装置、计算机设备和存储介质 WO2021174717A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3174601A CA3174601C (en) 2020-03-05 2020-06-19 Text intent identifying method, device, computer equipment and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010146166.XA CN111325037B (zh) 2020-03-05 2020-03-05 文本意图识别方法、装置、计算机设备和存储介质
CN202010146166.X 2020-03-05

Publications (1)

Publication Number Publication Date
WO2021174717A1 true WO2021174717A1 (zh) 2021-09-10

Family

ID=71163911

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097006 WO2021174717A1 (zh) 2020-03-05 2020-06-19 文本意图识别方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
CN (1) CN111325037B (zh)
CA (1) CA3174601C (zh)
WO (1) WO2021174717A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114095282A (zh) * 2022-01-21 2022-02-25 杭银消费金融股份有限公司 一种基于短文本特征提取的风控处理方法及设备
CN114154509A (zh) * 2021-11-26 2022-03-08 深圳集智数字科技有限公司 一种意图确定方法及装置
CN114915514A (zh) * 2022-03-28 2022-08-16 青岛海尔科技有限公司 意图的处理方法和装置、存储介质及电子装置
CN115859999A (zh) * 2022-12-09 2023-03-28 河北尚云信息科技有限公司 意图识别方法、装置、电子设备及存储介质
WO2024027552A1 (zh) * 2022-08-03 2024-02-08 马上消费金融股份有限公司 文本分类方法及装置、文本识别方法及装置、电子设备、存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737962A (zh) * 2020-06-24 2020-10-02 平安科技(深圳)有限公司 一种实体修订方法、装置、计算机设备和可读存储介质
CN111931512A (zh) * 2020-07-01 2020-11-13 联想(北京)有限公司 语句意图的确定方法及装置、存储介质
CN112231474A (zh) * 2020-10-13 2021-01-15 中移(杭州)信息技术有限公司 意图识别方法、系统、电子设备及存储介质
CN112580350A (zh) * 2020-12-30 2021-03-30 讯飞智元信息科技有限公司 一种诉求分析方法、装置、电子设备和存储介质
CN112668664B (zh) * 2021-01-06 2022-11-15 安徽迪科数金科技有限公司 一种基于智能语音的话术训练方法
CN113064984A (zh) * 2021-04-25 2021-07-02 深圳壹账通智能科技有限公司 意图识别方法、装置、电子设备及可读存储介质
CN113836346B (zh) * 2021-09-08 2023-08-08 网易(杭州)网络有限公司 为音频文件生成摘要的方法、装置、计算设备及存储介质
CN115333768A (zh) * 2022-06-29 2022-11-11 国家计算机网络与信息安全管理中心 一种面向海量网络攻击的快速研判方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977294A (zh) * 2019-04-03 2019-07-05 三角兽(北京)科技有限公司 信息/查询处理装置、查询处理/文本查询方法、存储介质
CN110209791A (zh) * 2019-06-12 2019-09-06 百融云创科技股份有限公司 一种多轮对话智能语音交互系统及装置
CN110427467A (zh) * 2019-06-26 2019-11-08 深圳追一科技有限公司 问答处理方法、装置、计算机设备和存储介质
US20190377790A1 (en) * 2018-06-06 2019-12-12 International Business Machines Corporation Supporting Combinations of Intents in a Conversation
CN110704641A (zh) * 2019-10-11 2020-01-17 零犀(北京)科技有限公司 一种万级意图分类方法、装置、存储介质及电子设备

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722558B (zh) * 2012-05-29 2016-08-03 百度在线网络技术(北京)有限公司 一种为用户推荐提问的方法和装置
CN104516986B (zh) * 2015-01-16 2018-01-16 青岛理工大学 一种语句识别方法及装置
CN104899285B (zh) * 2015-06-04 2018-09-25 百度在线网络技术(北京)有限公司 搜索结果展示方法和装置
CN105893444A (zh) * 2015-12-15 2016-08-24 乐视网信息技术(北京)股份有限公司 情感分类方法及装置
US20170242886A1 (en) * 2016-02-19 2017-08-24 Jack Mobile Inc. User intent and context based search results
CN108536708A (zh) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 一种自动问答处理方法及自动问答系统
CN107168991B (zh) * 2017-03-28 2020-12-04 北京三快在线科技有限公司 一种搜索结果展示方法和装置
CN108334533B (zh) * 2017-10-20 2021-12-24 腾讯科技(深圳)有限公司 关键词提取方法和装置、存储介质及电子装置
US20190163691A1 (en) * 2017-11-30 2019-05-30 CrowdCare Corporation Intent Based Dynamic Generation of Personalized Content from Dynamic Sources
CN109947909B (zh) * 2018-06-19 2024-03-12 平安科技(深圳)有限公司 智能客服应答方法、设备、存储介质及装置
CN109033305B (zh) * 2018-07-16 2022-04-01 深圳前海微众银行股份有限公司 问题回答方法、设备及计算机可读存储介质
CN109285030A (zh) * 2018-08-29 2019-01-29 深圳壹账通智能科技有限公司 产品推荐方法、装置、终端及计算机可读存储介质
CN109522393A (zh) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 智能问答方法、装置、计算机设备和存储介质
CN109740152B (zh) * 2018-12-25 2023-02-17 腾讯科技(深圳)有限公司 文本类目的确定方法、装置、存储介质和计算机设备
CN109785840B (zh) * 2019-03-05 2021-01-29 湖北亿咖通科技有限公司 自然语言识别的方法、装置及车载多媒体主机、计算机可读存储介质
CN110069631B (zh) * 2019-04-08 2022-11-29 腾讯科技(深圳)有限公司 一种文本处理方法、装置以及相关设备
CN110096570B (zh) * 2019-04-09 2021-03-30 苏宁易购集团股份有限公司 一种应用于智能客服机器人的意图识别方法及装置
CN110232114A (zh) * 2019-05-06 2019-09-13 平安科技(深圳)有限公司 语句意图识别方法、装置及计算机可读存储介质
CN110276067B (zh) * 2019-05-07 2022-11-22 创新先进技术有限公司 文本意图确定方法以及装置
CN110162633B (zh) * 2019-05-21 2022-02-11 深圳市珍爱云信息技术有限公司 语音数据意图确定方法、装置、计算机设备和存储介质
CN110489538B (zh) * 2019-08-27 2020-12-25 腾讯科技(深圳)有限公司 基于人工智能的语句应答方法、装置及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190377790A1 (en) * 2018-06-06 2019-12-12 International Business Machines Corporation Supporting Combinations of Intents in a Conversation
CN109977294A (zh) * 2019-04-03 2019-07-05 三角兽(北京)科技有限公司 信息/查询处理装置、查询处理/文本查询方法、存储介质
CN110209791A (zh) * 2019-06-12 2019-09-06 百融云创科技股份有限公司 一种多轮对话智能语音交互系统及装置
CN110427467A (zh) * 2019-06-26 2019-11-08 深圳追一科技有限公司 问答处理方法、装置、计算机设备和存储介质
CN110704641A (zh) * 2019-10-11 2020-01-17 零犀(北京)科技有限公司 一种万级意图分类方法、装置、存储介质及电子设备

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154509A (zh) * 2021-11-26 2022-03-08 深圳集智数字科技有限公司 一种意图确定方法及装置
CN114095282A (zh) * 2022-01-21 2022-02-25 杭银消费金融股份有限公司 一种基于短文本特征提取的风控处理方法及设备
CN114095282B (zh) * 2022-01-21 2022-04-15 杭银消费金融股份有限公司 一种基于短文本特征提取的风控处理方法及设备
CN114915514A (zh) * 2022-03-28 2022-08-16 青岛海尔科技有限公司 意图的处理方法和装置、存储介质及电子装置
CN114915514B (zh) * 2022-03-28 2024-03-22 青岛海尔科技有限公司 意图的处理方法和装置、存储介质及电子装置
WO2024027552A1 (zh) * 2022-08-03 2024-02-08 马上消费金融股份有限公司 文本分类方法及装置、文本识别方法及装置、电子设备、存储介质
CN115859999A (zh) * 2022-12-09 2023-03-28 河北尚云信息科技有限公司 意图识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CA3174601A1 (en) 2021-09-10
CN111325037A (zh) 2020-06-23
CA3174601C (en) 2024-04-02
CN111325037B (zh) 2022-03-29

Similar Documents

Publication Publication Date Title
WO2021174717A1 (zh) 文本意图识别方法、装置、计算机设备和存储介质
CN109408526B (zh) Sql语句生成方法、装置、计算机设备及存储介质
CN109871446B (zh) 意图识别中的拒识方法、电子装置及存储介质
WO2021042503A1 (zh) 信息分类抽取方法、装置、计算机设备和存储介质
CN106033416B (zh) 一种字符串处理方法及装置
WO2020220539A1 (zh) 数据增量方法、装置、计算机设备及存储介质
WO2020077896A1 (zh) 提问数据生成方法、装置、计算机设备和存储介质
US9898464B2 (en) Information extraction supporting apparatus and method
CN112035599B (zh) 基于垂直搜索的查询方法、装置、计算机设备及存储介质
WO2020114100A1 (zh) 一种信息处理方法、装置和计算机存储介质
CN108027814B (zh) 停用词识别方法与装置
CN108959247B (zh) 一种数据处理方法、服务器及计算机可读介质
CN109947903B (zh) 一种成语查询方法及装置
CN111445968A (zh) 电子病历查询方法、装置、计算机设备和存储介质
CN111159987A (zh) 数据图表绘制方法、装置、设备和计算机可读存储介质
CN112199588A (zh) 舆情文本筛选方法及装置
CN110795942B (zh) 基于语义识别的关键词确定方法、装置和存储介质
CN112632248A (zh) 问答方法、装置、计算机设备和存储介质
CN110377618B (zh) 裁决结果分析方法、装置、计算机设备和存储介质
US20220058214A1 (en) Document information extraction method, storage medium and terminal
CN114253990A (zh) 数据库查询方法、装置、计算机设备和存储介质
CN113254588A (zh) 一种数据搜索方法及系统
CN117112595A (zh) 一种信息查询方法、装置、电子设备及存储介质
CN110532456B (zh) 案件查询方法、装置、计算机设备和存储介质
CN109684357B (zh) 信息处理方法及装置、存储介质、终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923120

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3174601

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923120

Country of ref document: EP

Kind code of ref document: A1