WO2021204017A1 - Text intent recognition method and apparatus, and related device - Google Patents

Text intent recognition method and apparatus, and related device Download PDF

Info

Publication number
WO2021204017A1
WO2021204017A1 PCT/CN2021/083876 CN2021083876W WO2021204017A1 WO 2021204017 A1 WO2021204017 A1 WO 2021204017A1 CN 2021083876 W CN2021083876 W CN 2021083876W WO 2021204017 A1 WO2021204017 A1 WO 2021204017A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
recognized
word
features
feature
Prior art date
Application number
PCT/CN2021/083876
Other languages
French (fr)
Chinese (zh)
Inventor
李�杰
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021204017A1 publication Critical patent/WO2021204017A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device and related equipment for text intent recognition.
  • text intent recognition is mainly the recognized text obtained by voice recognition of the customer's voice by the intelligent customer service system, and further determines the meaning expressed by the customer by recognizing the text intent, and then responds to the customer according to the intent matching corresponding text.
  • the inventor realizes that using only a single sentence for intention recognition may identify wrong intentions. For example, the customer's current words may be based on the previous few sentences as the premise. When the premise is not satisfied, the intention expressed by the current sentence may be completely different. Leading to wrong responses from customer robots not only reduces the customer experience, but also provides customers with wrong services.
  • the present application provides a text intention recognition method and system, which effectively solves the problem of intent recognition errors caused by the complexity and diversity of the dialogue content in the previous single sentence dialogue intention recognition.
  • an embodiment of the present application provides a text intent recognition method, including: acquiring voice information and a text queue, and converting the voice information into text to be recognized, the text queue includes one or more texts; the text to be recognized and the text queue Extract features of each text in the to obtain the text feature of the text to be recognized and the text feature corresponding to each text; according to the text feature of the text to be recognized and the text feature corresponding to each text, the fusion feature corresponding to the text to be recognized is obtained; The fusion features are classified by the intention classification model, and the intent corresponding to the text to be recognized is obtained.
  • an embodiment of the present application provides a text intent recognition device, including: an acquisition unit for acquiring voice information and a text queue; a preprocessing unit for converting the voice information into text to be recognized and adding it to the text queue
  • the feature extraction unit is used to extract features of the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text
  • the fusion unit is used to extract the features according to the text feature of the text to be recognized and The text feature of each text obtains the fusion feature of the text to be recognized
  • the classification unit is used to classify the fusion feature through the intent classification model to classify the intent to obtain the intent corresponding to the text to be recognized.
  • an embodiment of the present application provides a text intent recognition device, including: a processor and a memory, and the processor executes the code in the memory to execute a text intent recognition method, including: acquiring voice information and a text queue, and converting the voice information into The text to be recognized, the text queue includes one or more texts; the text to be recognized and each text in the text queue are extracted from features to obtain the text feature of the text to be recognized and the corresponding text feature of each text; according to the text feature of the text to be recognized The text feature corresponding to each piece of text obtains the fusion feature corresponding to the text to be recognized; the fusion feature is classified by the intention classification model to obtain the intent corresponding to the text to be recognized.
  • a computer-readable storage medium includes instructions.
  • the computer executes a text intent recognition method, including: acquiring voice information and a text queue, and converting the voice information into text to be recognized.
  • the queue includes one or more texts; extract features of the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text; corresponding to each text according to the text feature of the text to be recognized
  • the fusion feature corresponding to the text to be recognized is obtained; the fusion feature is classified into intent through the intent classification model, and the intent corresponding to the text to be recognized is obtained.
  • the embodiment of the application captures context matching information from the word level to the sentence level between the text to be recognized and each text in the text queue, so that the feature fusion of different texts at different granularities can make full use of historical semantic information to achieve context.
  • the fusion of information combines word-level features and sentence-level features to obtain a more discriminative feature, which improves the accuracy of text intent recognition.
  • FIG. 1 is a schematic diagram of a work flow of a text intent recognition intelligent customer service system provided by an embodiment of the present application
  • FIG. 2 is a flowchart of a method for text intent recognition provided by an embodiment of the present application
  • Fig. 3 is a schematic diagram of a model for extracting text sentence-level features provided by an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a text intention recognition device provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of feature extraction provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a text intention recognition device provided by an embodiment of the present application.
  • the technical solution of the present application may involve the field of artificial intelligence and/or big data technology, and may be used in scenarios such as financial technology such as intelligent question answering in a banking system to realize intention recognition.
  • the data involved in this application such as voice, text, and/or intention information, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
  • the text mentioned in this embodiment includes words or sentences.
  • Words are collective names of words and phrases, including words (including words and compound words) and phrases (also known as phrases), which constitute the smallest word structure unit of a sentence article .
  • Sentence is the basic unit of language use. It is composed of words and phrases (phrases). It can express a complete meaning, such as telling someone something, asking a question, expressing a request or stopping, expressing a certain emotion, expressing a certain sentence The continuation or omission of.
  • Figure 1 shows a schematic diagram of the workflow of a text intent recognition system.
  • the framework describes the overall workflow of the intelligent customer service system.
  • the text features include word-level features and sentence-level features; according to the text features of the text to be recognized and the text queue The text feature of each text, get the fusion feature corresponding to the text to be recognized; classify the fusion features through intent to get the intent corresponding to the text to be recognized; finally, the intelligent customer service system can select according to the current process link and customer intent category Reply with appropriate reply words.
  • FIG. 2 a flow chart of a method for text intent recognition is provided. Taking the method applied to the intelligent customer service system in FIG. 1 as an example for description, the method includes the following steps:
  • S101 Acquire voice information and a text queue, and convert the voice information into text to be recognized.
  • the voice information input by the customer is obtained, and the voice information is used for the intelligent customer service system to convert the voice information into the text to be recognized to obtain the text classification intention, that is, the corresponding customer demand intention.
  • the user inputs "I want to listen to Jay Chou's song", which is used by the intelligent customer service system to convert the voice input by the customer into the text to be recognized, so as to obtain the intent of listening to the song.
  • the voice recognition algorithm wav2letter++ is used for voice recognition, and the voice input by the customer is converted into the corresponding text to be recognized.
  • a text queue is obtained, where the text queue includes one or more texts.
  • the text queue can hold k pieces of text.
  • the way to add text in the text queue is: after the voice information is converted into the text to be recognized, when the number of texts in the text queue is less than k, the text to be recognized is added to the text queue.
  • the k pieces of text are arranged in the order of adding time; when the number of texts in the text queue is equal to k, the first text added to the text queue is deleted, and the text to be recognized is added to the text queue.
  • the text in the text queue is sorted in the order of entry, followed by ⁇ 1, 2, 3, 4, 5 ⁇ , where 1 represents the first text added in the text queue , In the same way, 2, 3, 4, 5 and so on.
  • the texts to be recognized are directly added to the text queue in order.
  • text 1 is deleted first, and then the texts to be recognized are added.
  • to recognize the text and each text in the text queue first, extract the word-level features, then use m attention models to extract the sentence-level features; finally, the text to be recognized is corresponding
  • the word-level features and sentence-level features are combined as the features of the text to be recognized;
  • the specific steps for extracting features of the text to be recognized and each text in the text queue include:
  • the first step is to extract features at the word level
  • a word segmentation tool is used to perform word segmentation processing to obtain the word x, where the word segmentation tool can be jieba, SnowNLP, THULAC, NLPIR, etc.
  • the word segmentation tool can be jieba, SnowNLP, THULAC, NLPIR, etc.
  • the word x i can be obtained after word segmentation processing.
  • connect n word vectors to obtain the word vector matrix W i of the i-th recognized text as the word-level feature.
  • k word vector matrices ⁇ W 1 , W 2 ...W k ⁇ can be obtained. It is understandable that, after the above-mentioned processing of the text to be recognized, the word-level feature W k+1 of the text to be recognized can be obtained.
  • the word embedding matrix V may be obtained by training the Word2vec model on 3 million pieces of text data, or it may be obtained by training on other models, which is not limited in the embodiment of the present application.
  • corpus cleaning part-of-speech tagging, and removal of stop words, such as deleting noise data, removing modal particles according to a preset modal particle table, etc., which are not limited in the embodiment of this application.
  • the output after jieba word segmentation processing can be: “today”, “every day”, “weather”, “how”, “How", “Ah”, “?”, after part-of-speech tagging, the output can be: “Today n”, "Everyday v”, “Weather n”, “How to r”, “How to r”, “Ah y” , “?Vv”, where n represents a noun, v represents a verb, r represents a pronoun, y represents a modal particle, and vv represents a punctuation mark.
  • the output can be: "today n", “every day v”, “weather n”, “how r”, “how r”.
  • the removed modal particles can be removed according to the preset modal particle list. In this way, n words in one sentence can be obtained.
  • the second step is to extract features at the sentence level
  • m attention models are used to process the word vector matrix W i to reach m sentence features at different levels: u i,1 ⁇ u i,m .
  • the output of the i-th attention model is used as the input of the i+1-th attention model, and i is a positive integer greater than or equal to 1 and less than m.
  • the output of the previous attention model in the m attention modules is used as the input of the next attention model.
  • k texts can be processed to obtain k sentence-level features ⁇ y 1 , y 2 ...y k ⁇ .
  • the feature word vector matrix W i at the word level is used as the input of the first attention model
  • the output of the first attention model is used as the input of the second attention model
  • the last attention model is used in turn.
  • the output is used as the input of the next attention model
  • m sentence features u i,1 ⁇ u i,m at different levels In this way, using m attention model processing can obtain deeper semantic information.
  • the attention model can be understood as imagining that the constituent elements in the Source are composed of a series of ⁇ Key, Value> data. At this time, given an element Query in the Target, by calculating the similarity or correlation between the Query and each Key The weight coefficient of each Key corresponding to the Value is obtained, and then the Value is weighted and summed to obtain the final Attention value.
  • the calculation process is as follows:
  • the first step Calculate the similarity or correlation between the two based on Query and Key;
  • t, i, and j respectively represent the number of words in Query, Key, and Value
  • d represents the dimension of the word.
  • Q[t] ⁇ K[i] T represents the dot product of Q[t] and K[i] T
  • the result S(Q t ,K i ) represents a certain element Q[t] in target and K[ in source i] corresponds to the similarity value of V[j] to obtain the dependency relationship between the input word and the word.
  • the method of calculating similarity or correlation is only used as an example. In practical applications, calculating similarity or correlation can be the vector dot product of the two and the calculation of the two.
  • the vector Cosine similarity of the vector, and the introduction of an additional neural network for evaluation, etc., are not limited in the embodiment of the present application.
  • the second step normalize the original score of the first step to obtain the weight coefficient
  • the value range of the score generated in the first step is different depending on the specific generation method. Therefore, the calculation method of softmax is introduced in the second step to convert the score of the first step. On the one hand, it is normalized. , Organize the original calculated scores into a probability distribution with the sum of all element weights being 1. On the other hand, it is also possible to highlight the weights of important elements through the internal mechanism of SoftMax.
  • the specific calculation process is as follows:
  • i is a weight matrix, which represents the weight coefficient of V corresponding to K.
  • Step 3 Perform a weighted sum of Value according to the weight coefficient.
  • n Q represents the number of words in Q
  • V att represents the final Attention value of the element Q[t] pair.
  • the third step is to obtain text features based on word-level features and sentence-level features
  • the feature of the i-th recognized text word level and the feature of the i-th recognized text sentence level are combined as the feature of the i-th recognized text: [W i , y i ], for the k texts in the text queue Get k text features. It is understandable that for the text to be recognized, the feature sentence level at the word level of the text to be recognized can be combined as the feature of the text to be recognized, which can be up to [W k+1 ,y k+1 ].
  • the embodiment of the application uses m attention models to process the word vector matrix of each text and the word vector matrix of the text to be recognized, and the output of the previous attention model is used as the input of the next model to obtain each text
  • the obtained k text features and the text features to be recognized are used to perform the Deep Attention Matching (DAM) algorithm on the feature W at the word level and the feature y at the sentence level.
  • DAM Deep Attention Matching
  • the fusion feature of the text to be recognized is obtained.
  • the word-level features of the text to be recognized are matched with the word-level features of the obtained k texts to obtain word-level matching results, and the word-level matching results are merged to obtain the first fusion feature;
  • the sentence-level features of the recognized text are matched with the sentence-level features of the k texts to obtain sentence-level matching results, and the sentence-level matching results are fused to obtain the second fusion feature.
  • the first fusion feature and the second fusion feature are fused to obtain the fusion feature corresponding to the text to be recognized.
  • the idea of the DAM algorithm is to select the most matching response from a set of candidate responses under a given dialogue context. Specifically, first, each word in the context text or in the response text is regarded as the central meaning of the abstract semantic segment, and stacked attention is used to construct text representations of different granularities; second, taking into account the text relevance and dependency information, based on different Granular segment matching is used to match each text in the context and the response.
  • the DAM algorithm captures the matching information between the context and the response from the word level to the sentence level; then it extracts important information through convolution and maximum pooling operations.
  • the matching features are finally fused into a single matching score through a single-layer perceptron. In this way, feature fusion of different texts at different granularities can make full use of historical semantic information and achieve contextual information fusion.
  • the specific steps of the DAM algorithm are as follows: firstly, use layered attention to construct text representations of different granularities, and secondly, extract the truly paired fragments from the entire context and response.
  • the DAM algorithm model framework can be: representation-matching-aggregation.
  • the following uses sentence-level feature matching as an example to introduce the DAM algorithm.
  • the first layer of the DAM algorithm is the word embedding layer, which uses the sentence-level features y k+1 of the text to be recognized and the k text-level features y 1 , y 2 ,..., y k as the input of the word embedding layer.
  • the column of the matrix y is the dimension of the word vector
  • the row of the matrix y is the length of the text.
  • the second layer of the DAM algorithm is the presentation layer, and the role of the presentation layer is to construct semantic representations of different granularities.
  • the presentation layer has L layers, and L identical self-attention layers are stacked for processing.
  • the input of the first layer is the output of the 1-1th layer, and the input semantic vector can be combined into a multi-granularity representation.
  • the multi-granularity representation process is as follows:
  • Attentive represents the attention function, and the multi-granularity representations of y i and y k+1 are gradually constructed as with Among them, l ⁇ 0,L-1 ⁇ represents different granularities.
  • the third layer of the DAM algorithm is the matching layer, and the multi-granularity representation of each text output by the second layer of presentation layer with Construct self-attention matching matrix on granularity l And cross-attention matching matrix Perform multi-granularity matching to obtain matching features.
  • the self-attention matching process is as follows:
  • the kth embedding and The t-th embedding in y reflects the textual relevance of the k-th segment in y i and the t-th segment in y k+1 at the l-th granularity.
  • the cross-attention matching matrix is based on the cross-attention module, and the specific process is as follows:
  • the fourth layer of the DAM algorithm is the aggregation layer.
  • DAM finally aggregates all the segment matching degrees of the k text in the text queue and the text to be recognized into a 3D matching image Q.
  • the specific process is as follows:
  • f match ( ⁇ ) represents a matching function
  • M and b are learning parameters
  • is a sigmoid function
  • the fusion feature is classified by intent to obtain an intent corresponding to the recognized text.
  • a two-layer convolutional neural network is further used for the fused features to perform deeper feature extraction and dimensionality reduction, and finally the softmax function is used for intent classification to obtain the intent corresponding to the recognized text.
  • the type of intent is preset in the intelligent customer service system.
  • the intention classification is set to, but not limited to, checking the weather, setting an alarm clock, ordering meals, ordering tickets, broadcasting songs, and so on.
  • the customer input I want to listen to Jay Chou’s song then it can be classified as a song intent; the customer input how is the weather today, then it can be classified as the intent to check the weather; the customer input helps me set an alarm clock at 6 o’clock tomorrow morning, then it can be Be classified as the intention of setting an alarm clock.
  • S105 Perform a corresponding action according to the intent corresponding to the text to be recognized.
  • the customer service system selects an appropriate reply utterance in the corpus to reply according to the current process link and the customer's intention category.
  • the utterances in the corpus are preset by the system.
  • the customer enters "Today is in a great mood” to classify the intention and can be classified as mood intention.
  • the customer service system finds the corpus of mood intention from the preset corpus, and selects appropriate words to reply to the customer, such as "What is the mood? Okay, hurry up and share it with me.”.
  • the smart customer service system in the embodiment of the present application is merely an example, but the embodiment does not constitute any specific limitation on the function and application scope of the present application.
  • the text intention recognition method provided in this application can also be applied to electronic devices such as mobile phones and computers.
  • the text intent recognition method provided in this application is also suitable for recognizing user query intent based on one or more voices input by the user.
  • FIG. 4 is a schematic structural diagram of a text intention recognition apparatus provided by an embodiment of the present application.
  • the system 400 of this embodiment includes:
  • the acquiring unit 401 is used to acquire voice information and text queues
  • the preprocessing unit 402 is used to convert voice information into text to be recognized and add it to the text queue;
  • the feature extraction unit 403 is configured to extract features of the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text;
  • the fusion unit 404 is configured to obtain the fusion feature of the text to be recognized according to the text feature of the text to be recognized and the text feature of each text;
  • the classification unit 405 is configured to classify the fusion features through the intent classification model for intent classification to obtain the intent corresponding to the text to be recognized.
  • FIG. 5 is a schematic structural diagram of a feature extraction unit provided by an embodiment of the present application.
  • the feature extraction unit 403 includes a first extraction unit 4031, a second extraction unit 4032, and a merging unit. 4033,
  • the first extraction unit 4031 is used to extract the word-level features using the word embedding matrix for the text to be recognized and each text in the text queue;
  • the second extraction unit 4032 is configured to use multiple attention models to extract sentence-level features for the text to be recognized and each text in the text queue;
  • the merging unit 4033 is used to combine word-level features and sentence-level features as features for text recognition.
  • the acquiring unit 402 is configured to use the voice recognition algorithm wav2letter++ algorithm to perform voice recognition after acquiring the customer's voice information, and convert the voice input by the customer into the corresponding text to be recognized.
  • a text queue is obtained, where the text queue includes one or more texts.
  • the text queue can hold k pieces of text.
  • the way to add text in the text queue is: after the voice information is converted into the text to be recognized, when the number of texts in the text queue is less than k, the text to be recognized is added to the text queue.
  • the k pieces of text are arranged in the order of adding time; when the number of texts in the text queue is equal to k, the first text added to the text queue is deleted, and the text to be recognized is added to the text queue.
  • the first extraction unit 4031 is used to: first, use a word segmentation tool to perform word segmentation processing on each text in the text queue to obtain the word x, where the word segmentation tool can be jieba, SnowNLP, THULAC, NLPIR etc.
  • the word segmentation tool can be jieba, SnowNLP, THULAC, NLPIR etc.
  • the word x i can be obtained after word segmentation processing.
  • connect n word vectors to obtain the word vector matrix W i of the i-th recognized text as the word-level feature.
  • k word vector matrices ⁇ W 1 , W 2 ...W k ⁇ can be obtained. It is understandable that, after the above-mentioned processing of the text to be recognized, the word-level feature W k+1 of the text to be recognized can be obtained.
  • the word embedding matrix V may be obtained by training the Word2vec model on 3 million pieces of text data, or it may be obtained by training on other models, which is not limited in the embodiment of the present application.
  • corpus cleaning part-of-speech tagging, and removal of stop words, such as deleting noise data, removing modal particles according to a preset modal particle table, etc., which are not limited in the embodiment of this application.
  • the first extraction unit 4032 is configured to use m attention models to process the word vector matrix W i for the feature word vector matrix W i at the word level extracted from the i-th text.
  • m sentence features at different levels: u i,1 ⁇ u i,m .
  • the output of the i-th attention model is used as the input of the i+1-th attention model, and i is a positive integer greater than or equal to 1 and less than m.
  • the output of the previous attention model in the m attention modules is used as the input of the next attention model.
  • k texts can be processed to obtain k sentence-level features ⁇ y 1 , y 2 ...y k ⁇ .
  • the feature word vector matrix W i at the word level is used as the input of the first attention model
  • the output of the first attention model is used as the input of the second attention model
  • the last attention model in turn
  • the output is used as the input of the next attention model
  • m sentence features u i,1 ⁇ u i,m at different levels In this way, using m attention model processing can obtain deeper semantic information.
  • the first extraction unit 4033 is used to combine the feature of the i-th recognized text word level and the feature of the i-th recognized text sentence level as the feature of the i-th recognized text: [W i ,y i ], get k text features for k texts in the text queue. It is understandable that for the text to be recognized, the feature sentence level at the word level of the text to be recognized can be combined as the feature of the text to be recognized, which can be up to [W k+1 ,y k+1 ].
  • the fusion unit 404 is configured to use the DAM algorithm to match the obtained k text features and the text features to be recognized on the feature W at the word level and the feature y at the sentence level to obtain The fusion features of the text to be recognized.
  • the word-level features of the text to be recognized are matched with the word-level features of the obtained k texts to obtain word-level matching results, and the word-level matching results are merged to obtain the first fusion feature;
  • the sentence-level features of the recognized text are matched with the sentence-level features of the k texts to obtain sentence-level matching results, and the sentence-level matching results are fused to obtain the second fusion feature.
  • the first fusion feature and the second fusion feature are fused to obtain the fusion feature corresponding to the text to be recognized.
  • a two-layer convolutional neural network is further used for the fused features to perform deeper feature extraction and dimensionality reduction, and finally the softmax function is used for intent classification to obtain the intent corresponding to the recognized text.
  • the type of intention is preset in the intelligent customer service system.
  • the intention classification is set to, but not limited to, checking the weather, setting an alarm clock, ordering meals, ordering tickets, broadcasting songs, and so on.
  • the customer input I want to listen to Jay Chou’s song then it can be classified as a song intent; the customer input how is the weather today, then it can be classified as the intent to check the weather; the customer input helps me set an alarm clock at 6 o’clock tomorrow morning, then it can be Be classified as the intention of setting an alarm clock.
  • an embodiment of the present application provides an electronic device, which may include the text intention recognition method of any of the foregoing embodiments of the present application.
  • the electronic device may be, for example, a terminal device or a server or other devices.
  • the embodiment of the present application also provides another electronic device, including:
  • the processor and the memory, and the processor executes the code in the memory, thereby completing the operation of the method according to the textual intention of any of the foregoing embodiments of the embodiments of the present application.
  • FIG. 6 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device may be the aforementioned text intent recognition device.
  • FIG. 6 shows a schematic structural diagram of an electronic device suitable for implementing a terminal device or a server in the embodiments of the present application.
  • the electronic device includes: one or more processors 601; one or more input devices 602, one or more output devices 603, and a memory 604.
  • the aforementioned processor 601, input device 602, output device 603, and memory 604 are connected via a bus 605.
  • the memory 602 is used to store instructions, and the processor 601 is used to execute instructions stored in the memory 602.
  • the processor 601 is configured to call program instructions to execute:
  • the fusion feature corresponding to the text to be recognized is obtained;
  • the fusion features are classified by the intention classification model, and the intent corresponding to the text to be recognized is obtained.
  • the processor 601 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors or digital signal processors (Digital Signal Processors, DSPs). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the input device 602 may include a camera, where the camera has a function of storing image files and a function of transmitting image files
  • the output device 603 may include a display, a hard disk, a U disk, and the like.
  • the memory 604 may include a read-only memory and a random access memory, and provides instructions and data to the processor 601. A part of the memory 604 may also include a non-volatile random access memory. For example, the memory 604 may also store device type information.
  • the processor 601, the input device 602, and the output device 603 described in the embodiments of the present application can execute the implementation manners described in the various embodiments of the text intent recognition method and system provided in the embodiments of the present application. I won't repeat them here.
  • a computer-readable storage medium stores a computer program.
  • the computer program includes program instructions. Convert voice information into text to be recognized, the text queue includes one or more texts; extract features from the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text; The text feature of the recognized text and the text feature of each text are obtained to obtain the fusion feature of the text to be recognized; the fusion feature is classified by the intention classification model to obtain the intent corresponding to the text to be recognized.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • the computer-readable storage medium may be an internal storage unit of the electronic device of any of the foregoing embodiments, such as a hard disk or memory of a terminal.
  • the computer-readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk equipped on the terminal, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash card (Flash Card). )Wait.
  • the computer-readable storage medium may also include both an internal storage unit of an electronic device and an external storage device.
  • the computer-readable storage medium is used to store computer programs and other programs and data required by electronic devices.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • the disclosed server, device, and method may be implemented in other ways.
  • the server embodiments described above are only illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

A text intent recognition method, comprising: acquiring speech information and a text queue, and converting the speech information into text to be recognized (S101); extracting features from the text to be recognized and each piece of text in the text queue, so as to obtain text features of the text to be recognized and text features corresponding to each piece of text (S102); according to the text features of the text to be recognized and the text features of each piece of text, obtaining fused features of the text to be recognized (S103); and performing intent classification on the fused features to obtain an intent corresponding to the text to be recognized (S104). Context matching information of text to be recognized and each piece of text in a text queue is captured from a word level to a sentence level, and in this way, feature fusion is performed on different pieces of text on different granularities, such that historical semantic information can be fully used to fuse context information, and a feature at the word level and a feature at the sentence level are combined to obtain a feature that further has a discrimination capability, thereby improving the precision of text intent recognition.

Description

文本意图识别方法、装置以及相关设备Text intention recognition method, device and related equipment
本申请要求于2020年11月20日提交中国专利局、申请号为202011309413.X,发明名称为“文本意图识别方法、装置以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on November 20, 2020, the application number is 202011309413.X, and the invention title is "Text Intent Recognition Method, Apparatus, and Related Equipment", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种文本意图识别方法、装置以及相关设备。This application relates to the field of artificial intelligence technology, and in particular to a method, device and related equipment for text intent recognition.
背景技术Background technique
随着计算机网络技术的日益普及,文本意图识别广泛应用于智能语音助手以及智能对话机器人等产品,为了更好地了解客户的需求,更准确地做出回复,提高客户满意度,需要机器对话系统准确且完备地识别出客户发送的一段话对应的实际意图。With the increasing popularity of computer network technology, text intent recognition is widely used in products such as intelligent voice assistants and intelligent dialogue robots. In order to better understand customer needs, respond more accurately, and improve customer satisfaction, a machine dialogue system is required Accurately and completely identify the actual intention corresponding to a paragraph sent by the customer.
目前,文本意图识别主要是智能客服系统对客户的声音进行语音识别得到的识别文本,进一步通过对文本意图识别来判断客户表达的意思,然后根据意图匹配对应的文本回复客户。发明人意识到,仅使用单个句子进行意图识别有可能识别出错误的意图,例如客户当前的话可能以前面几句话作为前提,前提不满足的时候当前句子所表示的意图可能会完全不同,会导致客户机器人出现错误的回复,不仅降低了客户体验,而且还会给客户提供错误的服务。At present, text intent recognition is mainly the recognized text obtained by voice recognition of the customer's voice by the intelligent customer service system, and further determines the meaning expressed by the customer by recognizing the text intent, and then responds to the customer according to the intent matching corresponding text. The inventor realizes that using only a single sentence for intention recognition may identify wrong intentions. For example, the customer's current words may be based on the previous few sentences as the premise. When the premise is not satisfied, the intention expressed by the current sentence may be completely different. Leading to wrong responses from customer robots not only reduces the customer experience, but also provides customers with wrong services.
发明内容Summary of the invention
本申请提供了一种文本意图识别方法及系统,有效地解决了以往单句对话意图识别中因对话内容的复杂性和多样性导致的意图识别错误问题。The present application provides a text intention recognition method and system, which effectively solves the problem of intent recognition errors caused by the complexity and diversity of the dialogue content in the previous single sentence dialogue intention recognition.
第一方面,本申请实施例提供了一种文本意图识别方法,包括:获取语音信息和文本队列,将语音信息转换为待识别文本,文本队列包括一条或多条文本;对待识别文本和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征;根据待识别文本的文本特征与每条文本对应的文本特征,得到待识别文本对应的融合特征;将融合特征通过意图分类模型进行意图分类,得到待识别文本对应的意图。In the first aspect, an embodiment of the present application provides a text intent recognition method, including: acquiring voice information and a text queue, and converting the voice information into text to be recognized, the text queue includes one or more texts; the text to be recognized and the text queue Extract features of each text in the to obtain the text feature of the text to be recognized and the text feature corresponding to each text; according to the text feature of the text to be recognized and the text feature corresponding to each text, the fusion feature corresponding to the text to be recognized is obtained; The fusion features are classified by the intention classification model, and the intent corresponding to the text to be recognized is obtained.
第二方面,本申请实施例提供了一种文本意图识别装置,包括:获取单元,用于获取语音信息和文本队列;预处理单元,用于将语音信息转换为待识别文本,并加入文本队列;特征提取单元,用于对待识别文本和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征;融合单元,用于根据待识别文本的文本特征与每条文本的文本特征,得到待识别文本的融合特征;分类单元,用于将融合特征通过意图分类模型进行意图分类,得到待识别文本对应的意图。In a second aspect, an embodiment of the present application provides a text intent recognition device, including: an acquisition unit for acquiring voice information and a text queue; a preprocessing unit for converting the voice information into text to be recognized and adding it to the text queue The feature extraction unit is used to extract features of the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text; the fusion unit is used to extract the features according to the text feature of the text to be recognized and The text feature of each text obtains the fusion feature of the text to be recognized; the classification unit is used to classify the fusion feature through the intent classification model to classify the intent to obtain the intent corresponding to the text to be recognized.
第三方面,本申请实施例提供一种文本意图识别设备,包括:处理器和存储器,处理器执行存储器中的代码执行文本意图识别方法,包括:获取语音信息和文本队列,将语音信息转换为待识别文本,文本队列包括一条或多条文本;对待识别文本和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征;根据待识别文本的文本特征与每条文本对应的文本特征,得到待识别文本对应的融合特征;将融合特征通过意图分类模型进行意图分类,得到待识别文本对应的意图。In a third aspect, an embodiment of the present application provides a text intent recognition device, including: a processor and a memory, and the processor executes the code in the memory to execute a text intent recognition method, including: acquiring voice information and a text queue, and converting the voice information into The text to be recognized, the text queue includes one or more texts; the text to be recognized and each text in the text queue are extracted from features to obtain the text feature of the text to be recognized and the corresponding text feature of each text; according to the text feature of the text to be recognized The text feature corresponding to each piece of text obtains the fusion feature corresponding to the text to be recognized; the fusion feature is classified by the intention classification model to obtain the intent corresponding to the text to be recognized.
第四方面,一种计算机可读存储介质,包括指令,当指令在计算机上运行时,使得计算机执行文本意图识别方法,包括:获取语音信息和文本队列,将语音信息转换为待识别文本,文本队列包括一条或多条文本;对待识别文本和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征;根据待识别文本的文本特征与每条文本对应的文本特征,得到待识别文本对应的融合特征;将融合特征通过意图分类模型进行意图分类,得到待识别文本对应的意图。In a fourth aspect, a computer-readable storage medium includes instructions. When the instructions run on a computer, the computer executes a text intent recognition method, including: acquiring voice information and a text queue, and converting the voice information into text to be recognized. The queue includes one or more texts; extract features of the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text; corresponding to each text according to the text feature of the text to be recognized The fusion feature corresponding to the text to be recognized is obtained; the fusion feature is classified into intent through the intent classification model, and the intent corresponding to the text to be recognized is obtained.
本申请实施例通过将待识别文本与文本队列中的每条文本从单词级到句子级捕获上下文匹配信息,这样对不同文本在不同粒度上进行特征融合,可以充分利用历史语义信息, 做到上下文信息的融合,结合单词层面的特征和句子层面的特征,从而得到一个更具鉴别能力的特征,提高了文本意图识别的精度。The embodiment of the application captures context matching information from the word level to the sentence level between the text to be recognized and each text in the text queue, so that the feature fusion of different texts at different granularities can make full use of historical semantic information to achieve context. The fusion of information combines word-level features and sentence-level features to obtain a more discriminative feature, which improves the accuracy of text intent recognition.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。In order to more clearly describe the technical solutions in the embodiments of the present application or the background art, the following will describe the drawings that need to be used in the embodiments of the present application or the background art.
图1是本申请实施例提供的一种文本意图识别智能客服系统工作流程示意图;FIG. 1 is a schematic diagram of a work flow of a text intent recognition intelligent customer service system provided by an embodiment of the present application;
图2是本申请实施例提供的一种文本意图识别方法的流程图;FIG. 2 is a flowchart of a method for text intent recognition provided by an embodiment of the present application;
图3是本申请实施例提供的一种提取文本句子层面特征的模型示意图;Fig. 3 is a schematic diagram of a model for extracting text sentence-level features provided by an embodiment of the present application;
图4是本申请实施例提供一种文本意图识别装置结构示意图;FIG. 4 is a schematic structural diagram of a text intention recognition device provided by an embodiment of the present application;
图5是本申请实施例提供一种特征提取的结构示意图;FIG. 5 is a schematic structural diagram of feature extraction provided by an embodiment of the present application;
图6是本申请实施例提供一种文本意图识别装置的结构示意图。FIG. 6 is a schematic structural diagram of a text intention recognition device provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请的实施例部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。The terms used in the embodiments of the application are only used to explain the specific embodiments of the application, and are not intended to limit the application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terms used in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit the application. As used in the specification of this application and the appended claims, unless the context clearly indicates other circumstances, the singular forms "a", "an" and "the" are intended to include plural forms.
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should be further understood that the term "and/or" used in the specification and appended claims of this application refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .
本申请的技术方案可涉及人工智能和/或大数据技术领域,可用于金融科技如银行系统的智能问答等场景,以实现意图识别。可选的,本申请涉及的数据如语音、文本和/或意图信息等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。The technical solution of the present application may involve the field of artificial intelligence and/or big data technology, and may be used in scenarios such as financial technology such as intelligent question answering in a banking system to realize intention recognition. Optionally, the data involved in this application, such as voice, text, and/or intention information, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
本实施例中所提到的文本包括词语或者句子,词语是词和短语的合称,包括词(含单词、合成词)和词组(又称短语),组成语句文章的最小组词结构形式单元。句子是语言运用的基本单位,它由词、词组(短语)构成,能表达一个完整的意思,如告诉别人一件事情,提出一个问题,表示要求或者制止,表示某种感慨,表示对一段话的延续或省略。The text mentioned in this embodiment includes words or sentences. Words are collective names of words and phrases, including words (including words and compound words) and phrases (also known as phrases), which constitute the smallest word structure unit of a sentence article . Sentence is the basic unit of language use. It is composed of words and phrases (phrases). It can express a complete meaning, such as telling someone something, asking a question, expressing a request or stopping, expressing a certain emotion, expressing a certain sentence The continuation or omission of.
首先对本申请的实施例涉及的用于进行文本意图识别的智能客服系统。First, the intelligent customer service system for text intent recognition involved in the embodiments of the present application is described.
图1示出了一种文本意图识别系统工作流程示意图,该框架描述了智能客服系统总体工作流程。本实施例中首先获取客户语音信息,进行语音识别,得到待识别文本,将待识别文本加入文本队列,其中,文本队列中包括一个或多个待识别文本;接着对待识别文本和文本队列里的每条文本提取特征,得到待识别文本的文本特征和文本队列中每条文本的文本特征,其中,文本特征包括单词层面的特征与句子层面的特征;根据待识别文本的文本特征与文本队列中每条文本的文本特征,得到待识别文本对应的融合特征;将融合的特征通过意图分类,得到待识别文本对应的意图;最后智能客服系统可以根据当前所处的流程环节和客户意图类别来选取合适的回复话语进行回复。Figure 1 shows a schematic diagram of the workflow of a text intent recognition system. The framework describes the overall workflow of the intelligent customer service system. In this embodiment, first obtain customer voice information, perform voice recognition, and obtain the text to be recognized, and add the text to be recognized into the text queue, where the text queue includes one or more text to be recognized; then the text to be recognized and the text in the text queue Extract features of each text to obtain the text features of the text to be recognized and the text features of each text in the text queue. The text features include word-level features and sentence-level features; according to the text features of the text to be recognized and the text queue The text feature of each text, get the fusion feature corresponding to the text to be recognized; classify the fusion features through intent to get the intent corresponding to the text to be recognized; finally, the intelligent customer service system can select according to the current process link and customer intent category Reply with appropriate reply words.
在一个具体的实施例中,如图2所示,提供了一种文本意图识别方法的流程图,以该方法应用于图1中的智能客服系统为例进行说明,包括以下步骤:In a specific embodiment, as shown in FIG. 2, a flow chart of a method for text intent recognition is provided. Taking the method applied to the intelligent customer service system in FIG. 1 as an example for description, the method includes the following steps:
S101,获取语音信息和文本队列,将语音信息转换为待识别文本。S101: Acquire voice information and a text queue, and convert the voice information into text to be recognized.
在本申请具体的实施例中,获取客户输入的语音信息,语音信息是用于智能客服系统将语音信息转化为待识别文本,得到文本分类意图,即对应的客户需求意图。例如,用户输入“我想听周杰伦的歌”、,用于智能客服系统将客户输入的语音转化为待识别文本,得到听歌的需求意图。获取客户语音信息后,采用语音识别算法wav2letter++算法进行语音识别,将客户输入的语音转化成对应的待识别文本。同时,获取文本队列,其中,文本队列包括一条或多条文本。文本队列中能够容纳k条文本,文本队列里文本的加入方式是:将语音信息转换为待识别文本之后,当文本队列中的文本数量小于k时,将待识别文本加入文本队列,文本队列中的k条文本按照加入的时间顺序排列;当文本队列中的文本数量等于k时,删除最先加入文本队列的文本,将待识别文本加入文本队列。In the specific embodiment of the present application, the voice information input by the customer is obtained, and the voice information is used for the intelligent customer service system to convert the voice information into the text to be recognized to obtain the text classification intention, that is, the corresponding customer demand intention. For example, the user inputs "I want to listen to Jay Chou's song", which is used by the intelligent customer service system to convert the voice input by the customer into the text to be recognized, so as to obtain the intent of listening to the song. After obtaining the customer's voice information, the voice recognition algorithm wav2letter++ is used for voice recognition, and the voice input by the customer is converted into the corresponding text to be recognized. At the same time, a text queue is obtained, where the text queue includes one or more texts. The text queue can hold k pieces of text. The way to add text in the text queue is: after the voice information is converted into the text to be recognized, when the number of texts in the text queue is less than k, the text to be recognized is added to the text queue. The k pieces of text are arranged in the order of adding time; when the number of texts in the text queue is equal to k, the first text added to the text queue is deleted, and the text to be recognized is added to the text queue.
示例性地,若文本队列的大小为5,对文本队列中的文本按照进入顺序排序,依次为{1,2,3,4,5},其中,1表示文本队列中第一个加入的文本,同理2、3、4、5依次类推。当文本队列里的文本数不超过5时,待识别文本直接按顺序加入文本队列,当文本队列里的文本数为5时,则先删除文本1,然后加入待识别文本。Exemplarily, if the size of the text queue is 5, the text in the text queue is sorted in the order of entry, followed by {1, 2, 3, 4, 5}, where 1 represents the first text added in the text queue , In the same way, 2, 3, 4, 5 and so on. When the number of texts in the text queue does not exceed 5, the texts to be recognized are directly added to the text queue in order. When the number of texts in the text queue is 5, text 1 is deleted first, and then the texts to be recognized are added.
S102,对待识别文本和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征。S102: Extract features from the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text.
在一种具体的实施例中,对待识别文本和文本队列里的每条文本,首先,提取单词层面的特征,然后,使用m个注意力模型提取句子层面的特征;最后,将待识别文本对应的单词层面的特征和句子层面的特征联合起来作为待识别文本的特征;In a specific embodiment, to recognize the text and each text in the text queue, first, extract the word-level features, then use m attention models to extract the sentence-level features; finally, the text to be recognized is corresponding The word-level features and sentence-level features are combined as the features of the text to be recognized;
对待识别文本和文本队列里的每条文本提取特征具体的步骤包括:The specific steps for extracting features of the text to be recognized and each text in the text queue include:
第一步,提取单词层面的特征The first step is to extract features at the word level
具体地,首先,对文本队列里的每条文本,使用分词工具进行分词处理,得到词x,其中分词工具可以是jieba、SnowNLP、THULAC、NLPIR等。本申请实施例中对此不构成任何限定。对于文本队列里的第i个识别文本,经过分词处理后可以得到词x i。然后,将第i个识别文本中的n个词x i映射到词嵌入矩阵V中得到n个词向量V(x i)。最后,连接n个词向量得到第i个识别文本的词向量矩阵W i作为单词层面的的特征。k个文本进过处理可以得到k个词向量矩阵{W 1、W 2…W k}。可以理解的是,对待识别文本经过上述处理,可以得到待识别文本的单词层面的特征W k+1Specifically, first, for each text in the text queue, a word segmentation tool is used to perform word segmentation processing to obtain the word x, where the word segmentation tool can be jieba, SnowNLP, THULAC, NLPIR, etc. This does not constitute any limitation in the embodiments of the present application. For the i-th recognized text in the text queue, the word x i can be obtained after word segmentation processing. Then, map the n words x i in the i-th recognized text to the word embedding matrix V to obtain n word vectors V(x i ). Finally, connect n word vectors to obtain the word vector matrix W i of the i-th recognized text as the word-level feature. After k texts are processed, k word vector matrices {W 1 , W 2 …W k } can be obtained. It is understandable that, after the above-mentioned processing of the text to be recognized, the word-level feature W k+1 of the text to be recognized can be obtained.
其中,词嵌入矩阵V可以是经过在300万条文本数据上训练Word2vec模型得到的,也可以是在其他模型上训练的得到,本申请实施例对此不作任何限定。在分词之前或者之后可以有语料清洗、词性标注、去停用词,例如删除噪声数据、根据事先设定的语气词表去除语气词等,本申请实施例对此不作任何限定。The word embedding matrix V may be obtained by training the Word2vec model on 3 million pieces of text data, or it may be obtained by training on other models, which is not limited in the embodiment of the present application. Before or after word segmentation, there may be corpus cleaning, part-of-speech tagging, and removal of stop words, such as deleting noise data, removing modal particles according to a preset modal particle table, etc., which are not limited in the embodiment of this application.
示例性地,这里以jieba分词为例,当输入的识别文本为“今天天气怎么样呀?”,经过jieba分词处理输出可以为:“今天”、“天天”、“天气”、“怎么”、“怎么样”、“呀”、“?”,经过词性标注输出可以为:“今天n”、“天天v”、“天气n”、“怎么r”、“怎么样r”、“呀y”、“?vv”,其中n表示名词,v表示动词,r表示代词,y表示语气词,vv表示标点符号,经过去停用词输出可以为:“今天n”、“天天v”、“天气n”、“怎么r”、“怎么样r”。其中去除语气词可以根据事先设定的语气词表去除。这样可以的到一句话的n个词。Exemplarily, take jieba word segmentation as an example. When the input recognition text is "How is the weather today?", the output after jieba word segmentation processing can be: "today", "every day", "weather", "how", "How", "Ah", "?", after part-of-speech tagging, the output can be: "Today n", "Everyday v", "Weather n", "How to r", "How to r", "Ah y" , "?Vv", where n represents a noun, v represents a verb, r represents a pronoun, y represents a modal particle, and vv represents a punctuation mark. After removing the stop words, the output can be: "today n", "every day v", "weather n", "how r", "how r". Among them, the removed modal particles can be removed according to the preset modal particle list. In this way, n words in one sentence can be obtained.
第二步,提取句子层面的特征The second step is to extract features at the sentence level
具体地,对于第i个文本提取到的单词层面的特征词向量矩阵W i,使用m个注意力模型对词向量矩阵W i进行处理,到m个不同层面的句子特征:u i,1~u i,m。其中,第i个注意力模型的输出作为第i+1个注意力模型的输入,i为大于等于1,小于m的正整数。也就是说,在m个注意力模块中上一个注意力模型的输出作为下一个注意力模型的输入。将y={u i,1~u i,m}作为一句文本中句子层面的特征,k个文本进过处理可以得到k个句子层面的 特征{y 1、y 2…y k}。如图3所示,单词层面的特征词向量矩阵W i作为第一个注意力模型的输入,第一个注意力模型的输出作为第二个注意力模型的输入,依次上一个注意力模型的输出作为下一个注意力模型的输入,最终的到m个不同层面的句子特征u i,1~u i,m。这样使用m个注意力模型处理可以得到更深层次的语义信息。可以理解的是,对待识别文本经过上述处理,可以得到待识别文本的句子层面的特征y k+1={u k+1,1~u k+1,m}。 Specifically, for the feature word vector matrix W i at the word level extracted from the i-th text, m attention models are used to process the word vector matrix W i to reach m sentence features at different levels: u i,1 ~ u i,m . Among them, the output of the i-th attention model is used as the input of the i+1-th attention model, and i is a positive integer greater than or equal to 1 and less than m. In other words, the output of the previous attention model in the m attention modules is used as the input of the next attention model. Taking y={u i,1 ~u i,m } as a sentence-level feature in a sentence text, k texts can be processed to obtain k sentence-level features {y 1 , y 2 …y k }. As shown in Figure 3, the feature word vector matrix W i at the word level is used as the input of the first attention model, the output of the first attention model is used as the input of the second attention model, and the last attention model is used in turn. The output is used as the input of the next attention model, and finally to m sentence features u i,1 ~ u i,m at different levels. In this way, using m attention model processing can obtain deeper semantic information. It will be appreciated that the text to be recognized through the above process, wherein the recognized text sentence level may be obtained y k + 1 = {u k + 1,1 ~ u k + 1, m}.
注意力模型可以理解为将Source中的构成元素想象成是由一系列的<Key,Value>数据构成,此时给定Target中的某个元素Query,通过计算Query和各个Key的相似性或者相关性,得到每个Key对应Value的权重系数,然后对Value进行加权求和,即得到了最终的Attention数值。其计算过程具体如下:The attention model can be understood as imagining that the constituent elements in the Source are composed of a series of <Key, Value> data. At this time, given an element Query in the Target, by calculating the similarity or correlation between the Query and each Key The weight coefficient of each Key corresponding to the Value is obtained, and then the Value is weighted and summed to obtain the final Attention value. The calculation process is as follows:
第一步:根据Query和Key计算两者的相似性或者相关性;The first step: Calculate the similarity or correlation between the two based on Query and Key;
Figure PCTCN2021083876-appb-000001
Figure PCTCN2021083876-appb-000001
其中,t,i,j分别表示Query、Key、Value中词的个数,d表示词的维度。Q[t]·K[i] T表示Q[t]与K[i] T作点乘,其结果S(Q t,K i)表示target中某个元素Q[t]与source中K[i]对应的V[j]的相似值,得到输入词与词之间的依赖关系。可以理解的是,本申请实施例中,计算相似性或者相关性的方式仅仅仅用于举例说明,在实际应用中,计算相似性或者相关性可以是求两者的向量点积、求两者的向量Cosine相似性、在引入额外的神经网络来求值等方式,本申请实施例对此不作任何限定。 Among them, t, i, and j respectively represent the number of words in Query, Key, and Value, and d represents the dimension of the word. Q[t]·K[i] T represents the dot product of Q[t] and K[i] T , and the result S(Q t ,K i ) represents a certain element Q[t] in target and K[ in source i] corresponds to the similarity value of V[j] to obtain the dependency relationship between the input word and the word. It is understandable that in the embodiments of the present application, the method of calculating similarity or correlation is only used as an example. In practical applications, calculating similarity or correlation can be the vector dot product of the two and the calculation of the two. The vector Cosine similarity of the vector, and the introduction of an additional neural network for evaluation, etc., are not limited in the embodiment of the present application.
第二步:对第一步的原始分值进行归一化处理,得到权重系数;The second step: normalize the original score of the first step to obtain the weight coefficient;
具体地,第一步产生的分值根据具体产生的方法不同其数值取值范围也不一样,因此第二步引入softmax的计算方式对第一步的得分进行数值转换,一方面进行归一化,将原始计算分值整理成所有元素权重之和为1的概率分布;另一方面也可以通过SoftMax在内在机制更突出重要元素的权重。具体计算过程如下:Specifically, the value range of the score generated in the first step is different depending on the specific generation method. Therefore, the calculation method of softmax is introduced in the second step to convert the score of the first step. On the one hand, it is normalized. , Organize the original calculated scores into a probability distribution with the sum of all element weights being 1. On the other hand, it is also possible to highlight the weights of important elements through the internal mechanism of SoftMax. The specific calculation process is as follows:
Figure PCTCN2021083876-appb-000002
Figure PCTCN2021083876-appb-000002
其中,a t,i为权重矩阵,表示K对应的V的权重系数。 Among them, at , i is a weight matrix, which represents the weight coefficient of V corresponding to K.
第三步:根据权重系数对Value进行加权求和。Step 3: Perform a weighted sum of Value according to the weight coefficient.
Figure PCTCN2021083876-appb-000003
Figure PCTCN2021083876-appb-000003
其中,n Q表示Q中词的个数,V att表示元素Q[t]对的最终Attention值。 Among them, n Q represents the number of words in Q, and V att represents the final Attention value of the element Q[t] pair.
第三步,根据单词层面的特征和句子层面的特征得到文本特征The third step is to obtain text features based on word-level features and sentence-level features
具体的,将第i个识别文本单词层面的特征和第i个识别文本句子层面的特征联合起来作为第i个识别文本的特征:[W i,y i],对于文本队列里的k个文本得到k个文本特征。可以理解的是,对于待识别文本,将待识别文本单词层面的特征句子层面的特征联合起来作为待识别文本的特征,可以的到[W k+1,y k+1]。 Specifically, the feature of the i-th recognized text word level and the feature of the i-th recognized text sentence level are combined as the feature of the i-th recognized text: [W i , y i ], for the k texts in the text queue Get k text features. It is understandable that for the text to be recognized, the feature sentence level at the word level of the text to be recognized can be combined as the feature of the text to be recognized, which can be up to [W k+1 ,y k+1 ].
本申请实施例,通过对每条文本的词向量矩阵和待识别文本的词向量矩阵,使用m个注意力模型进行处理,上一个注意力模型的输出作为下一个模型的输入,得到每条文本的m歌不同层面的特征和待识别文本的m个不同层面的特征,从而得到更丰富多层次多次度的特征。The embodiment of the application uses m attention models to process the word vector matrix of each text and the word vector matrix of the text to be recognized, and the output of the previous attention model is used as the input of the next model to obtain each text The features of different levels of the m song and the features of m different levels of the text to be recognized, so as to obtain richer, multi-level and multiple-level features.
S103,根据待识别文本的文本特征与文本队列中每条文本对应的文本特征,得到待识别文本对应的融合特征。S103: Obtain a fusion feature corresponding to the text to be recognized according to the text feature of the text to be recognized and the text feature corresponding to each text in the text queue.
在一种具体的实施例中,将得到的k条文本特征和待识别文本特征,使深度关注匹配(Deep Attention Matching,DAM)算法分别对在单词层面的特征W以及句子层面的特征y上进行匹配,得到待识别文本的融合特征。具体地,将待识别文本的单词层面的特征与得 到的k条文本的单词层面的特征进行匹配,得到单词层面的匹配结果,将单词层面的匹配结果进行融合,得到第一融合特征;将待识别文本的句子层面的特征与k条文本的句子层面的特征进行匹配,得到句子层面的匹配结果,将句子层面的匹配结果进行融合,得到第二融合特征。将第一融合特征和第二融合特征进行融合,得到待识别文本对应的融合特征。In a specific embodiment, the obtained k text features and the text features to be recognized are used to perform the Deep Attention Matching (DAM) algorithm on the feature W at the word level and the feature y at the sentence level. Matching, the fusion feature of the text to be recognized is obtained. Specifically, the word-level features of the text to be recognized are matched with the word-level features of the obtained k texts to obtain word-level matching results, and the word-level matching results are merged to obtain the first fusion feature; The sentence-level features of the recognized text are matched with the sentence-level features of the k texts to obtain sentence-level matching results, and the sentence-level matching results are fused to obtain the second fusion feature. The first fusion feature and the second fusion feature are fused to obtain the fusion feature corresponding to the text to be recognized.
其中,DAM算法的思想是在给定对话上下文情况下,从一组候选响应中选择最匹配的响应。具体地,首先将上下文文本中或者响应中文本的每一个单词作为抽象语义片段的中心意义,使用层叠的注意力构造不同粒度的文本表示;其次,考虑到文本相关性和依赖性信息,基于不同粒度的片段匹配来匹配上下文和响应中的每个文本,通过这种方式,DAM算法从单词级到句子级捕获上下文与响应之间的匹配信息;然后通过卷积和最大池化操作提取重要的匹配特征,最后通过单层感知器融合成一个单一的匹配分数。这样对不同文本在不同粒度上进行特征融合可以充分利用历史语义信息,做到上下文信息融合。Among them, the idea of the DAM algorithm is to select the most matching response from a set of candidate responses under a given dialogue context. Specifically, first, each word in the context text or in the response text is regarded as the central meaning of the abstract semantic segment, and stacked attention is used to construct text representations of different granularities; second, taking into account the text relevance and dependency information, based on different Granular segment matching is used to match each text in the context and the response. In this way, the DAM algorithm captures the matching information between the context and the response from the word level to the sentence level; then it extracts important information through convolution and maximum pooling operations. The matching features are finally fused into a single matching score through a single-layer perceptron. In this way, feature fusion of different texts at different granularities can make full use of historical semantic information and achieve contextual information fusion.
DAM算法具体步骤为:首先使用层叠的注意力构造不同粒度的文本表示,其次,在整个上下文和响应中提取真正配对的片段。The specific steps of the DAM algorithm are as follows: firstly, use layered attention to construct text representations of different granularities, and secondly, extract the truly paired fragments from the entire context and response.
具体地,DAM算法模型框架可以为:表示-匹配-聚集。下面以句子层面的特征匹配为例介绍DAM算法。Specifically, the DAM algorithm model framework can be: representation-matching-aggregation. The following uses sentence-level feature matching as an example to introduce the DAM algorithm.
DAM算法的第一层为词嵌入层,分别将待识别文本句子层面的特征y k+1和k条文本层面的特征y 1,y 2,…,y k作为词嵌入层的输入。其中,矩阵y的列为词向量的维度,矩阵y的行是文本的长度。 The first layer of the DAM algorithm is the word embedding layer, which uses the sentence-level features y k+1 of the text to be recognized and the k text-level features y 1 , y 2 ,..., y k as the input of the word embedding layer. Among them, the column of the matrix y is the dimension of the word vector, and the row of the matrix y is the length of the text.
DAM算法的第二层为表示层,表示层的作用是构建不同粒度的语义表示。表示层有L层,采用L个相同的自注意力层堆叠处理,第l层的输入为第l-1层的输出,进而可以将输入的语义向量组合成多粒度表示。其中,多粒度表示过程具体如下:The second layer of the DAM algorithm is the presentation layer, and the role of the presentation layer is to construct semantic representations of different granularities. The presentation layer has L layers, and L identical self-attention layers are stacked for processing. The input of the first layer is the output of the 1-1th layer, and the input semantic vector can be combined into a multi-granularity representation. Among them, the multi-granularity representation process is as follows:
Figure PCTCN2021083876-appb-000004
Figure PCTCN2021083876-appb-000004
Figure PCTCN2021083876-appb-000005
Figure PCTCN2021083876-appb-000005
其中,Attentive表示进行注意力函数,y i和y k+1的多粒度表示逐渐被构造出来为
Figure PCTCN2021083876-appb-000006
Figure PCTCN2021083876-appb-000007
其中,l∈{0,L-1}表示不同粒度。
Among them, Attentive represents the attention function, and the multi-granularity representations of y i and y k+1 are gradually constructed as
Figure PCTCN2021083876-appb-000006
with
Figure PCTCN2021083876-appb-000007
Among them, l∈{0,L-1} represents different granularities.
DAM算法的第三层为匹配层,将第二层表示层输出的每个文本的多粒度表示
Figure PCTCN2021083876-appb-000008
Figure PCTCN2021083876-appb-000009
在粒度l上构建自注意力匹配矩阵
Figure PCTCN2021083876-appb-000010
和交叉注意力匹配矩阵
Figure PCTCN2021083876-appb-000011
进行多粒度匹配,得到匹配的特征。
The third layer of the DAM algorithm is the matching layer, and the multi-granularity representation of each text output by the second layer of presentation layer
Figure PCTCN2021083876-appb-000008
with
Figure PCTCN2021083876-appb-000009
Construct self-attention matching matrix on granularity l
Figure PCTCN2021083876-appb-000010
And cross-attention matching matrix
Figure PCTCN2021083876-appb-000011
Perform multi-granularity matching to obtain matching features.
其中,自注意力匹配过程具体如下:Among them, the self-attention matching process is as follows:
Figure PCTCN2021083876-appb-000012
Figure PCTCN2021083876-appb-000012
其中,
Figure PCTCN2021083876-appb-000013
表示每个句子文本中词的个数,矩阵
Figure PCTCN2021083876-appb-000014
中的每个元素都是
Figure PCTCN2021083876-appb-000015
Figure PCTCN2021083876-appb-000016
的点积,
Figure PCTCN2021083876-appb-000017
中的第k个嵌入和
Figure PCTCN2021083876-appb-000018
中的第t个嵌入反映了y i中第k个片段和y k+1中第t个片段在第l个粒度上的文本相关性。
in,
Figure PCTCN2021083876-appb-000013
Represents the number of words in each sentence text, matrix
Figure PCTCN2021083876-appb-000014
Each element in is
Figure PCTCN2021083876-appb-000015
with
Figure PCTCN2021083876-appb-000016
Dot product,
Figure PCTCN2021083876-appb-000017
The kth embedding and
Figure PCTCN2021083876-appb-000018
The t-th embedding in y reflects the textual relevance of the k-th segment in y i and the t-th segment in y k+1 at the l-th granularity.
交叉注意匹配矩阵基于交叉注意力模块,具体过程如下:The cross-attention matching matrix is based on the cross-attention module, and the specific process is as follows:
Figure PCTCN2021083876-appb-000019
Figure PCTCN2021083876-appb-000019
Figure PCTCN2021083876-appb-000020
Figure PCTCN2021083876-appb-000020
Figure PCTCN2021083876-appb-000021
Figure PCTCN2021083876-appb-000021
通过注意力模块使得
Figure PCTCN2021083876-appb-000022
Figure PCTCN2021083876-appb-000023
相互交叉注意,构建两个新的表示:
Figure PCTCN2021083876-appb-000024
Figure PCTCN2021083876-appb-000025
Figure PCTCN2021083876-appb-000026
捕获了跨越文本队列中k条文本和待识别文本的语义结构。因此,对话文本内部相互依赖的片段在表示中彼此接近,并且这些潜在的内部依赖之间的点积可以增加,从而提供依赖感知的匹配信息。
Through the attention module
Figure PCTCN2021083876-appb-000022
with
Figure PCTCN2021083876-appb-000023
Pay attention to each other and construct two new representations:
Figure PCTCN2021083876-appb-000024
with
Figure PCTCN2021083876-appb-000025
with
Figure PCTCN2021083876-appb-000026
It captures the semantic structure of k texts and texts to be recognized across the text queue. Therefore, interdependent segments within the dialogue text are close to each other in the representation, and the dot product between these potential internal dependencies can be increased, thereby providing perceptually dependent matching information.
DAM算法的第四层为聚集层,DAM最终将文本队列中k条文本和待识别文本的所有分段匹配度聚合成3D匹配图像Q,具体过程如下:The fourth layer of the DAM algorithm is the aggregation layer. DAM finally aggregates all the segment matching degrees of the k text in the text queue and the text to be recognized into a 3D matching image Q. The specific process is as follows:
Figure PCTCN2021083876-appb-000027
Figure PCTCN2021083876-appb-000027
Figure PCTCN2021083876-appb-000028
Figure PCTCN2021083876-appb-000028
其中,
Figure PCTCN2021083876-appb-000029
表示级联操作,每个像素有2(L+1)个通道,在不同的粒度级别储存一个特定的片段之间的匹配度,然后,DAM算法利用具有最大池化操作的双层卷积来从整个图像中提取重要的匹配特征f match(y i,y k+1)。最终通过单层感知器,用提取的匹配特征f match(y i,y k+1)来计算匹配分数g(y i,y k+1),具体过程如下:
in,
Figure PCTCN2021083876-appb-000029
Represents the cascade operation. Each pixel has 2 (L+1) channels. The matching degree between a specific segment is stored at different granularity levels. Then, the DAM algorithm uses the double-layer convolution with the maximum pooling operation to Extract important matching features f match (y i , y k+1 ) from the entire image. Finally, through a single-layer perceptron, the extracted matching feature f match (y i ,y k+1 ) is used to calculate the matching score g(y i ,y k+1 ). The specific process is as follows:
g(y i,y k)=σ(Mf match(y i,y k+1)+b) g(y i ,y k )=σ(Mf match (y i ,y k+1 )+b)
其中,f match(·)表示匹配函数,M和b是学习参数,σ是sigmoid函数。 Among them, f match (·) represents a matching function, M and b are learning parameters, and σ is a sigmoid function.
为了简便起见,上面只陈述了句子层面的特征基于DAM算法进行匹配融合,实际上单词层面的特征基于DAM算法进行匹配融合方式类似,此处不再展开赘述。For the sake of simplicity, only the sentence-level features are matched and fused based on the DAM algorithm. In fact, the word-level features are matched and fused based on the DAM algorithm in a similar way, so I won’t go into details here.
S104,将融合特征通过意图分类得到识别文本对应的意图。In S104, the fusion feature is classified by intent to obtain an intent corresponding to the recognized text.
在一种具体的实施例中,对融合的特征进一步使用两层卷积神经网络进行更深层特征提取及降维,最后使用softmax函数进行意图分类,得到识别文本对应的意图。其中,意图的种类是在智能客服系统中预先设置好的。In a specific embodiment, a two-layer convolutional neural network is further used for the fused features to perform deeper feature extraction and dimensionality reduction, and finally the softmax function is used for intent classification to obtain the intent corresponding to the recognized text. Among them, the type of intent is preset in the intelligent customer service system.
可选地,在任务型机器人客服系统中,意图分类设置为但不限于查天气、设闹钟、订餐、订票、播歌等。示例性,客户输入我想听周杰伦的歌,那么可以归为播歌意图;客户输入今天天气怎么样,那么可以归为查天气意图;客户输入帮我定个明天早上6点的闹钟,那么可以归为设闹钟意图。Optionally, in the task-based robot customer service system, the intention classification is set to, but not limited to, checking the weather, setting an alarm clock, ordering meals, ordering tickets, broadcasting songs, and so on. For example, the customer input I want to listen to Jay Chou’s song, then it can be classified as a song intent; the customer input how is the weather today, then it can be classified as the intent to check the weather; the customer input helps me set an alarm clock at 6 o’clock tomorrow morning, then it can be Be classified as the intention of setting an alarm clock.
S105,根据待识别文本对应的意图,执行对应的动作。S105: Perform a corresponding action according to the intent corresponding to the text to be recognized.
在一种具体的实施例中,得到分类的意图后,客服系统根据当前所处的流程环节和客户意图类别在语料库中选择合适的回复话语进行回复。其中,语料库中的话语是系统预先设置好的。示例性地,客户输入“今天心情很棒”进行意图分类后可以归为心情意图,客服系统从预先设置的语料库中找到心情意图的语料,并选择合适的话语回复客户,如“什么事心情这么好,赶紧跟我分享一下吧。”。In a specific embodiment, after obtaining the classified intention, the customer service system selects an appropriate reply utterance in the corpus to reply according to the current process link and the customer's intention category. Among them, the utterances in the corpus are preset by the system. Exemplarily, the customer enters "Today is in a great mood" to classify the intention and can be classified as mood intention. The customer service system finds the corpus of mood intention from the preset corpus, and selects appropriate words to reply to the customer, such as "What is the mood? Okay, hurry up and share it with me.".
可以理解的是,本申请实施例的智能客服系统仅仅为举例说明,但是该实施例对本申请的功能和适用范围不构成任何具体的限定。本申请提供的一种文本意图识别方法还可以应用于手机、计算机等电子设备。例如,在搜索引擎中,本申请提供的一种文本意图识别方法还适用于根据用户输入的一条或多条语音识别用户查询意图。It is understandable that the smart customer service system in the embodiment of the present application is merely an example, but the embodiment does not constitute any specific limitation on the function and application scope of the present application. The text intention recognition method provided in this application can also be applied to electronic devices such as mobile phones and computers. For example, in a search engine, the text intent recognition method provided in this application is also suitable for recognizing user query intent based on one or more voices input by the user.
本申请实施例还提供了一种文本意图识别装置,该装置可用于实现本申请上述各文本意图识别方法实施例。具体地,参见图4,图4是本申请实施例提供的一种文本意图识别装置结构示意图。本实施例的系统400包括:The embodiment of the present application also provides a text intent recognition device, which can be used to implement the above-mentioned text intent recognition method embodiments of the present application. Specifically, referring to FIG. 4, FIG. 4 is a schematic structural diagram of a text intention recognition apparatus provided by an embodiment of the present application. The system 400 of this embodiment includes:
获取单元401,用于获取语音信息和文本队列;The acquiring unit 401 is used to acquire voice information and text queues;
预处理单元402,用于将语音信息转换为待识别文本,并加入文本队列;The preprocessing unit 402 is used to convert voice information into text to be recognized and add it to the text queue;
特征提取单元403,用于对待识别文本和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征;The feature extraction unit 403 is configured to extract features of the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text;
融合单元404,用于根据待识别文本的文本特征与每条文本的文本特征,得到待识别文本的融合特征;The fusion unit 404 is configured to obtain the fusion feature of the text to be recognized according to the text feature of the text to be recognized and the text feature of each text;
分类单元405,用于将融合特征通过意图分类模型进行意图分类,得到待识别文本对应的意图。The classification unit 405 is configured to classify the fusion features through the intent classification model for intent classification to obtain the intent corresponding to the text to be recognized.
在一种具体的实现方式中,参见图5,图5是本申请实施例提供的一种特征提取单元的结构示意图,特征提取单元403包括第一提取单元4031,第二提取单元4032,合并单元 4033,In a specific implementation, refer to FIG. 5, which is a schematic structural diagram of a feature extraction unit provided by an embodiment of the present application. The feature extraction unit 403 includes a first extraction unit 4031, a second extraction unit 4032, and a merging unit. 4033,
第一提取单元4031,用于对待识别文本和文本队列里的每条文本,使用词嵌入矩阵提取单词层面的特征;The first extraction unit 4031 is used to extract the word-level features using the word embedding matrix for the text to be recognized and each text in the text queue;
第二提取单元4032,用于对待识别文本和文本队列里的每条文本使用多个注意力模型提取句子层面的特征;The second extraction unit 4032 is configured to use multiple attention models to extract sentence-level features for the text to be recognized and each text in the text queue;
合并单元4033,用于将单词层面的特征和句子层面的特征联合起来作为识别文本的特征。The merging unit 4033 is used to combine word-level features and sentence-level features as features for text recognition.
在本申请文本意图识别装置的一个具体的实施例中,获取单元402,用于获取客户语音信息后,采用语音识别算法wav2letter++算法进行语音识别,将客户输入的语音转化成对应的待识别文本。同时,获取文本队列,其中,文本队列包括一条或多条文本。文本队列中能够容纳k条文本,文本队列里文本的加入方式是:将语音信息转换为待识别文本之后,当文本队列中的文本数量小于k时,将待识别文本加入文本队列,文本队列中的k条文本按照加入的时间顺序排列;当文本队列中的文本数量等于k时,删除最先加入文本队列的文本,将待识别文本加入文本队列。In a specific embodiment of the text intent recognition apparatus of the present application, the acquiring unit 402 is configured to use the voice recognition algorithm wav2letter++ algorithm to perform voice recognition after acquiring the customer's voice information, and convert the voice input by the customer into the corresponding text to be recognized. At the same time, a text queue is obtained, where the text queue includes one or more texts. The text queue can hold k pieces of text. The way to add text in the text queue is: after the voice information is converted into the text to be recognized, when the number of texts in the text queue is less than k, the text to be recognized is added to the text queue. The k pieces of text are arranged in the order of adding time; when the number of texts in the text queue is equal to k, the first text added to the text queue is deleted, and the text to be recognized is added to the text queue.
在一种具体的实施例中,第一提取单元4031用于,首先,对文本队列里的每条文本,使用分词工具进行分词处理,得到词x,其中分词工具可以是jieba、SnowNLP、THULAC、NLPIR等。本申请实施例中对此不构成任何限定。对于文本队列里的第i个识别文本,经过分词处理后可以得到词x i。然后,将第i个识别文本中的n个词x i映射到词嵌入矩阵V中得到n个词向量V(x i)。最后,连接n个词向量得到第i个识别文本的词向量矩阵W i作为单词层面的的特征。k个文本进过处理可以得到k个词向量矩阵{W 1、W 2…W k}。可以理解的是,对待识别文本经过上述处理,可以得到待识别文本的单词层面的特征W k+1In a specific embodiment, the first extraction unit 4031 is used to: first, use a word segmentation tool to perform word segmentation processing on each text in the text queue to obtain the word x, where the word segmentation tool can be jieba, SnowNLP, THULAC, NLPIR etc. This does not constitute any limitation in the embodiments of the present application. For the i-th recognized text in the text queue, the word x i can be obtained after word segmentation processing. Then, map the n words x i in the i-th recognized text to the word embedding matrix V to obtain n word vectors V(x i ). Finally, connect n word vectors to obtain the word vector matrix W i of the i-th recognized text as the word-level feature. After k texts are processed, k word vector matrices {W 1 , W 2 …W k } can be obtained. It is understandable that, after the above-mentioned processing of the text to be recognized, the word-level feature W k+1 of the text to be recognized can be obtained.
其中,词嵌入矩阵V可以是经过在300万条文本数据上训练Word2vec模型得到的,也可以是在其他模型上训练的得到,本申请实施例对此不作任何限定。在分词之前或者之后可以有语料清洗、词性标注、去停用词,例如删除噪声数据、根据事先设定的语气词表去除语气词等,本申请实施例对此不作任何限定。The word embedding matrix V may be obtained by training the Word2vec model on 3 million pieces of text data, or it may be obtained by training on other models, which is not limited in the embodiment of the present application. Before or after word segmentation, there may be corpus cleaning, part-of-speech tagging, and removal of stop words, such as deleting noise data, removing modal particles according to a preset modal particle table, etc., which are not limited in the embodiment of this application.
在一种具体的实施例中,第一提取单元4032用于,对于第i个文本提取到的单词层面的特征词向量矩阵W i,使用m个注意力模型对词向量矩阵W i进行处理,到m个不同层面的句子特征:u i,1~u i,m。其中,第i个注意力模型的输出作为第i+1个注意力模型的输入,i为大于等于1,小于m的正整数。也就是说,在m个注意力模块中上一个注意力模型的输出作为下一个注意力模型的输入。将y={u i,1~u i,m}作为一句文本中句子层面的特征,k个文本进过处理可以得到k个句子层面的特征{y 1、y 2…y k}。如图3所示,单词层面的特征词向量矩阵W i作为第一个注意力模型的输入,第一个注意力模型的输出作为第二个注意力模型的输入,依次上一个注意力模型的输出作为下一个注意力模型的输入,最终的到m个不同层面的句子特征u i,1~u i,m。这样使用m个注意力模型处理可以得到更深层次的语义信息。可以理解的是,对待识别文本经过上述处理,可以得到待识别文本的句子层面的特征y k+1={u k+1,1~u k+1,m}。 In a specific embodiment, the first extraction unit 4032 is configured to use m attention models to process the word vector matrix W i for the feature word vector matrix W i at the word level extracted from the i-th text. Up to m sentence features at different levels: u i,1 ~ u i,m . Among them, the output of the i-th attention model is used as the input of the i+1-th attention model, and i is a positive integer greater than or equal to 1 and less than m. In other words, the output of the previous attention model in the m attention modules is used as the input of the next attention model. Taking y={u i,1 ~u i,m } as a sentence-level feature in a sentence text, k texts can be processed to obtain k sentence-level features {y 1 , y 2 …y k }. As shown in Figure 3, the feature word vector matrix W i at the word level is used as the input of the first attention model, the output of the first attention model is used as the input of the second attention model, and the last attention model in turn The output is used as the input of the next attention model, and finally to m sentence features u i,1 ~ u i,m at different levels. In this way, using m attention model processing can obtain deeper semantic information. It will be appreciated that the text to be recognized through the above process, wherein the recognized text sentence level may be obtained y k + 1 = {u k + 1,1 ~ u k + 1, m}.
在一种具体的实施例中,第一提取单元4033用于,将第i个识别文本单词层面的特征和第i个识别文本句子层面的特征联合起来作为第i个识别文本的特征:[W i,y i],对于文本队列里的k个文本得到k个文本特征。可以理解的是,对于待识别文本,将待识别文本单词层面的特征句子层面的特征联合起来作为待识别文本的特征,可以的到[W k+1,y k+1]。 In a specific embodiment, the first extraction unit 4033 is used to combine the feature of the i-th recognized text word level and the feature of the i-th recognized text sentence level as the feature of the i-th recognized text: [W i ,y i ], get k text features for k texts in the text queue. It is understandable that for the text to be recognized, the feature sentence level at the word level of the text to be recognized can be combined as the feature of the text to be recognized, which can be up to [W k+1 ,y k+1 ].
在一种具体的实施例中,融合单元404用于,将得到的k条文本特征和待识别文本特征,使用DAM算法分别对在单词层面的特征W以及句子层面的特征y上进行匹配,得到待识别文本的融合特征。具体地,将待识别文本的单词层面的特征与得到的k条文本的单词层面的特征进行匹配,得到单词层面的匹配结果,将单词层面的匹配结果进行融合,得到第一 融合特征;将待识别文本的句子层面的特征与k条文本的句子层面的特征进行匹配,得到句子层面的匹配结果,将句子层面的匹配结果进行融合,得到第二融合特征。将第一融合特征和第二融合特征进行融合,得到待识别文本对应的融合特征。In a specific embodiment, the fusion unit 404 is configured to use the DAM algorithm to match the obtained k text features and the text features to be recognized on the feature W at the word level and the feature y at the sentence level to obtain The fusion features of the text to be recognized. Specifically, the word-level features of the text to be recognized are matched with the word-level features of the obtained k texts to obtain word-level matching results, and the word-level matching results are merged to obtain the first fusion feature; The sentence-level features of the recognized text are matched with the sentence-level features of the k texts to obtain sentence-level matching results, and the sentence-level matching results are fused to obtain the second fusion feature. The first fusion feature and the second fusion feature are fused to obtain the fusion feature corresponding to the text to be recognized.
在一种具体的实施例中,对融合的特征进一步使用两层卷积神经网络进行更深层特征提取及降维,最后使用softmax函数进行意图分类,得到识别文本对应的意图。其中,意图得种类是在智能客服系统中预先设置好的。In a specific embodiment, a two-layer convolutional neural network is further used for the fused features to perform deeper feature extraction and dimensionality reduction, and finally the softmax function is used for intent classification to obtain the intent corresponding to the recognized text. Among them, the type of intention is preset in the intelligent customer service system.
可选地,在任务型机器人客服系统中,意图分类设置为但不限于查天气、设闹钟、订餐、订票、播歌等。示例性,客户输入我想听周杰伦的歌,那么可以归为播歌意图;客户输入今天天气怎么样,那么可以归为查天气意图;客户输入帮我定个明天早上6点的闹钟,那么可以归为设闹钟意图。Optionally, in the task-based robot customer service system, the intention classification is set to, but not limited to, checking the weather, setting an alarm clock, ordering meals, ordering tickets, broadcasting songs, and so on. For example, the customer input I want to listen to Jay Chou’s song, then it can be classified as a song intent; the customer input how is the weather today, then it can be classified as the intent to check the weather; the customer input helps me set an alarm clock at 6 o’clock tomorrow morning, then it can be Be classified as the intention of setting an alarm clock.
另外,本申请实施例提供了一种电子设备,其可以包括本申请上述任一实施例的文本意图识别方法。具体地,该电子设备例如可以是终端设备或者服务器等设备。In addition, an embodiment of the present application provides an electronic device, which may include the text intention recognition method of any of the foregoing embodiments of the present application. Specifically, the electronic device may be, for example, a terminal device or a server or other devices.
本申请实施例还提供了另一种电子设备,包括:The embodiment of the present application also provides another electronic device, including:
处理器和存储器,处理器执行存储器中的代码,从而完成本申请实施例上述任一实施例文本意图别方法的操作。The processor and the memory, and the processor executes the code in the memory, thereby completing the operation of the method according to the textual intention of any of the foregoing embodiments of the embodiments of the present application.
图6是本申请实施例提供的一种电子设备结构框图。该电子设备可以是上述的文本意图识别设备。下面参考图6,其示出了适用于来实现本申请实施例的终端设备或服务器的电子设备的结构示意图。如图6所示,该电子设备包括:一个或多个处理器601;一个或多个输入设备602,一个或多个输出设备603和存储器604。上述处理器601、输入设备602、输出设备603和存储器604通过总线605连接。存储器602用于存储指令,处理器601用于执行存储器602存储的指令。其中,处理器601被配置用于调用程序指令执行:Fig. 6 is a structural block diagram of an electronic device provided by an embodiment of the present application. The electronic device may be the aforementioned text intent recognition device. Reference is now made to FIG. 6, which shows a schematic structural diagram of an electronic device suitable for implementing a terminal device or a server in the embodiments of the present application. As shown in FIG. 6, the electronic device includes: one or more processors 601; one or more input devices 602, one or more output devices 603, and a memory 604. The aforementioned processor 601, input device 602, output device 603, and memory 604 are connected via a bus 605. The memory 602 is used to store instructions, and the processor 601 is used to execute instructions stored in the memory 602. Wherein, the processor 601 is configured to call program instructions to execute:
获取语音信息和文本队列,将语音信息转换为待识别文本;Acquire voice information and text queue, and convert voice information into text to be recognized;
对待识别文本和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征;Extract features from the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text;
根据待识别文本的文本特征与每条文本对应的文本特征,得到待识别文本对应的融合特征;According to the text feature of the text to be recognized and the text feature corresponding to each text, the fusion feature corresponding to the text to be recognized is obtained;
将融合特征通过意图分类模型进行意图分类,得到待识别文本对应的意图。The fusion features are classified by the intention classification model, and the intent corresponding to the text to be recognized is obtained.
应当理解,在本申请实施例中,所称处理器601可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor 601 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors or digital signal processors (Digital Signal Processors, DSPs). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
输入设备602可以包括摄像头,其中该摄像头具备存储影像文件功能以及传输影像文件功能,输出设备603可以包括显示器、硬盘、U盘等。The input device 602 may include a camera, where the camera has a function of storing image files and a function of transmitting image files, and the output device 603 may include a display, a hard disk, a U disk, and the like.
该存储器604可以包括只读存储器和随机存取存储器,并向处理器601提供指令和数据。存储器604的一部分还可以包括非易失性随机存取存储器。例如,存储器604还可以存储设备类型的信息。The memory 604 may include a read-only memory and a random access memory, and provides instructions and data to the processor 601. A part of the memory 604 may also include a non-volatile random access memory. For example, the memory 604 may also store device type information.
具体实现中,本申请实施例中所描述的处理器601、输入设备602、输出设备603可执行本申请实施例提供的一种文本意图识别方法和系统的各个实施例中所描述的实现方式,在此不再赘述。In specific implementation, the processor 601, the input device 602, and the output device 603 described in the embodiments of the present application can execute the implementation manners described in the various embodiments of the text intent recognition method and system provided in the embodiments of the present application. I won't repeat them here.
在本申请的另一实施例中提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序包括程序指令,程序指令被处理器执行时实现:获取语音信息和文本队列,将语音信息转换为待识别文本,文本队列包括一条或多条文本;对待识别文本 和文本队列中的每条文本提取特征,得到待识别文本的文本特征和每条文本对应的文本特征;根据待识别文本的文本特征与每条文本的文本特征,得到待识别文本的融合特征;将融合特征通过意图分类模型进行意图分类,得到待识别文本对应的意图。In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program. The computer program includes program instructions. Convert voice information into text to be recognized, the text queue includes one or more texts; extract features from the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text; The text feature of the recognized text and the text feature of each text are obtained to obtain the fusion feature of the text to be recognized; the fusion feature is classified by the intention classification model to obtain the intent corresponding to the text to be recognized.
可选的,该程序指令被处理器执行时还可实现上述实施例中方法的其他步骤,这里不再赘述。进一步可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。Optionally, when the program instructions are executed by the processor, other steps of the method in the foregoing embodiment may be implemented, which will not be repeated here. Further optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.
计算机可读存储介质可以是前述任一实施例的电子设备的内部存储单元,例如终端的硬盘或内存。计算机可读存储介质也可以是终端的外部存储设备,例如终端上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,计算机可读存储介质还可以既包括电子设备的内部存储单元也包括外部存储设备。计算机可读存储介质用于存储计算机程序以及电子设备所需的其他程序和数据。计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of the electronic device of any of the foregoing embodiments, such as a hard disk or memory of a terminal. The computer-readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk equipped on the terminal, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash card (Flash Card). )Wait. Further, the computer-readable storage medium may also include both an internal storage unit of an electronic device and an external storage device. The computer-readable storage medium is used to store computer programs and other programs and data required by electronic devices. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described in accordance with the function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的服务器、设备和单元的具体工作过程,可以参考前述方法实施例中的对应过程,也可执行发明实施例所描述的电子设备的实现方式,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working processes of the servers, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and can also perform the descriptions in the embodiments of the invention. The implementation method of the electronic device of, I will not repeat it here.
在本申请所提供的几个实施例中,应该理解到,所揭露的服务器、设备和方法,可以通过其它的方式实现。例如,以上所描述的服务器实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed server, device, and method may be implemented in other ways. For example, the server embodiments described above are only illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (20)

  1. 一种文本意图识别方法,包括:A method for text intent recognition, including:
    获取语音信息和文本队列,将所述语音信息转换为待识别文本,所述文本队列包括一条或多条文本;Acquiring voice information and a text queue, and converting the voice information into text to be recognized, the text queue including one or more texts;
    对所述待识别文本和所述文本队列中的每条文本提取特征,得到所述待识别文本的文本特征和所述每条文本对应的文本特征;Extracting features from the text to be recognized and each text in the text queue, to obtain the text feature of the text to be recognized and the text feature corresponding to each text;
    根据所述待识别文本的文本特征与所述每条文本的文本特征,得到所述待识别文本的融合特征;Obtaining the fusion feature of the text to be recognized according to the text feature of the text to be recognized and the text feature of each piece of text;
    将所述融合特征通过意图分类模型进行意图分类,得到所述待识别文本对应的意图。The fusion feature is classified into intent through an intent classification model, and the intent corresponding to the text to be recognized is obtained.
  2. 根据权利要求1所述方法,其中,所述文本队列容纳k条文本,所述将所述语音信息转换为待识别文本之后,还包括:The method according to claim 1, wherein the text queue contains k pieces of text, and after converting the voice information into the text to be recognized, the method further comprises:
    当所述文本队列中的文本数量小于k时,将所述待识别文本加入所述文本队列,所述文本队列中的k条文本按照加入的时间顺序排列;When the number of texts in the text queue is less than k, adding the text to be recognized into the text queue, and the k pieces of text in the text queue are arranged in the order of adding time;
    当所述文本队列中的文本数量等于k时,删除所述文本队列中最先加入的文本,将所述待识别文本加入所述文本队列。When the number of texts in the text queue is equal to k, the text added first in the text queue is deleted, and the text to be recognized is added to the text queue.
  3. 根据权利要求1所述方法,其中,对所述待识别文本和所述文本队列中的每条文本提取特征,得到所述待识别文本的文本特征和所述每条文本的文本特征,包括:The method according to claim 1, wherein extracting features from the text to be recognized and each text in the text queue to obtain the text features of the text to be recognized and the text features of each text comprises:
    对所述待识别文本,提取单词层面的特征,得到所述待识别文本的单词层面特征,使用m个注意力模型提取句子层面的特征,得到所述待识别文本的句子层面特征,将所述待识别文本的单词层面特征和所述待识别文本的句子层面特征作为所述待识别文本的文本特征;其中,m是大于1的正整数;For the text to be recognized, the word-level features are extracted to obtain the word-level features of the text to be recognized, and m attention models are used to extract the sentence-level features to obtain the sentence-level features of the text to be recognized. The word-level features of the text to be recognized and the sentence-level features of the text to be recognized are used as the text features of the text to be recognized; where m is a positive integer greater than 1;
    对所述每条文本,提取单词层面的特征,得到所述每条文本的单词层面特征,使用m个注意力模型提取句子层面的特征,得到所述每条文本的句子层面特征,将所述每条文本的单词层面特征和所述每条文本的句子层面特征作为所述每条文本的文本特征。For each piece of text, extract the word-level features to obtain the word-level features of each text. Use m attention models to extract the sentence-level features to obtain the sentence-level features of each text. The word-level feature of each text and the sentence-level feature of each text are used as the text feature of each text.
  4. 根据权利要求3所述方法,其中,所述对所述待识别文本,提取单词层面的特征,得到所述待识别文本的单词层面特征,包括:The method according to claim 3, wherein said extracting the word-level features of the text to be recognized to obtain the word-level features of the text to be recognized comprises:
    使用分词工具对所述待识别文本进行分词处理,得到n个词,将所述每条文本的所述n个词映射到词嵌入矩阵V中得到n个词向量,连接所述n个词向量得到所述每条文本的词向量矩阵,作为所述待识别文本的单词层面特征;Use a word segmentation tool to perform word segmentation processing on the text to be recognized to obtain n words, map the n words of each text to the word embedding matrix V to obtain n word vectors, and connect the n word vectors Obtain the word vector matrix of each text as the word-level feature of the text to be recognized;
    对所述每条文本,提取单词层面的特征,得到所述每条文本的单词层面特征,包括:For each piece of text, extract the word-level features to obtain the word-level features of each text, including:
    使用分词工具对所述每条文本进行分词处理,得到n个词,将所述每条文本中的所述n个词映射到词嵌入矩阵V中得到n个词向量,连接所述n个词向量得到所述每条文本的词向量矩阵,作为所述每条文本的单词层面特征。Use a word segmentation tool to perform word segmentation processing on each text to obtain n words, map the n words in each text to the word embedding matrix V to obtain n word vectors, and connect the n words The vector obtains the word vector matrix of each text as the word-level feature of each text.
  5. 根据权利要求3所述的方法,其中,所述使用m个注意力模型提取句子层面的特征,得到所述待识别文本的句子层面特征,包括:The method according to claim 3, wherein said extracting sentence-level features using m attention models to obtain sentence-level features of the text to be recognized comprises:
    对所述待识别文本的词向量矩阵,使用m个注意力模型对所述待识别文本的词向量矩阵进行处理,得到m个不同层面的特征,将所述m个不同层面的特征作为所述待识别文本的句子层面特征;其中,第i个注意力模型的输出作为第i+1个注意力模型的输入,所述m个注意力模型中的每个注意力模型输出一个层面的特征,i为大于等于1,小于m的正整数。For the word vector matrix of the text to be recognized, m attention models are used to process the word vector matrix of the text to be recognized to obtain m features at different levels, and use the m features at different levels as the Sentence-level features of the text to be recognized; among them, the output of the i-th attention model is used as the input of the i+1-th attention model, and each of the m attention models outputs a level feature, i is a positive integer greater than or equal to 1 and less than m.
  6. 根据权利要求5所述的方法,其中,所述使用m个注意力模型提取句子层面的特征,得到所述每条文本的句子层面特征,包括:The method according to claim 5, wherein said extracting sentence-level features using m attention models to obtain sentence-level features of each text comprises:
    对所述每条文本的词向量矩阵,使用m个注意力模型对所述每条文本的词向量矩阵进行处理,得到m个不同层面的特征,将所述m个不同层面的特征作为所述每条文本的句子层面特征。For the word vector matrix of each text, m attention models are used to process the word vector matrix of each text to obtain m features at different levels, and the m features at different levels are used as the The sentence-level features of each text.
  7. 根据权利要求3所述方法,其中,所述根据所述待识别文本的文本特征与文本队列中每条文本的文本特征,得到所述待识别文本的融合特征,包括:The method according to claim 3, wherein the obtaining the fusion feature of the text to be recognized according to the text feature of the text to be recognized and the text feature of each text in the text queue comprises:
    将所述待识别文本的文本特征和所述每条文本的文本特征,使用深度关注匹配算法DAM对所述待识别文本的单词层面的特征和所述每条文本的单词层面的特征在不同的粒度上进行匹配融合,得到第一融合特征;The text features of the text to be recognized and the text features of each piece of text, using the deep attention matching algorithm DAM, are used to compare the word-level features of the text to be recognized and the word-level features of each text in different Perform matching and fusion at granularity to obtain the first fusion feature;
    将所述待识别文本的文本特征和所述每条文本的文本特征,使用深度关注匹配算法DAM对所述待识别文本的句子层面的特征和所述每条文本的句子层面的特征在不同的粒度上进行匹配融合,得到第二融合特征;The text features of the text to be recognized and the text features of each piece of text, using the deep attention matching algorithm DAM, are used to compare the sentence-level features of the text to be recognized and the sentence-level features of each text in different Perform matching and fusion at the granularity to obtain the second fusion feature;
    将所述第一融合特征和所述第二融合特征进行融合,得到待识别文本的融合特征。The first fusion feature and the second fusion feature are fused to obtain the fusion feature of the text to be recognized.
  8. 一种文本意图识别装置,包括:A text intent recognition device, including:
    获取单元,用于获取语音信息和文本队列;The acquisition unit is used to acquire voice information and text queues;
    预处理单元,用于将所述语音信息转换为待识别文本,并加入文本队列;The preprocessing unit is used to convert the voice information into text to be recognized and add it to the text queue;
    特征提取单元,用于对所述待识别文本和所述文本队列中的每条文本提取特征,得到所述待识别文本的文本特征和所述每条文本对应的文本特征;A feature extraction unit, configured to extract features from the text to be recognized and each text in the text queue to obtain the text feature of the text to be recognized and the text feature corresponding to each text;
    融合单元,用于根据所述待识别文本的文本特征与所述每条文本的文本特征,得到所述待识别文本的融合特征;The fusion unit is configured to obtain the fusion feature of the text to be recognized according to the text feature of the text to be recognized and the text feature of each piece of text;
    分类单元,用于将所述融合特征通过意图分类模型进行意图分类,得到所述待识别文本对应的意图。The classification unit is used to classify the fusion features through an intent classification model to obtain an intent corresponding to the text to be recognized.
  9. 一种文本意图识别设备,包括:处理器和存储器,所述处理器执行所述存储器中的代码执行以下方法:A text intent recognition device includes: a processor and a memory, the processor executes the code in the memory to execute the following method:
    获取语音信息和文本队列,将所述语音信息转换为待识别文本,所述文本队列包括一条或多条文本;Acquiring voice information and a text queue, and converting the voice information into text to be recognized, the text queue including one or more texts;
    对所述待识别文本和所述文本队列中的每条文本提取特征,得到所述待识别文本的文本特征和所述每条文本对应的文本特征;Extracting features from the text to be recognized and each text in the text queue, to obtain the text feature of the text to be recognized and the text feature corresponding to each text;
    根据所述待识别文本的文本特征与所述每条文本的文本特征,得到所述待识别文本的融合特征;Obtaining the fusion feature of the text to be recognized according to the text feature of the text to be recognized and the text feature of each piece of text;
    将所述融合特征通过意图分类模型进行意图分类,得到所述待识别文本对应的意图。The fusion feature is classified into intent through an intent classification model, and the intent corresponding to the text to be recognized is obtained.
  10. 根据权利要求9所述的文本意图识别设备,其中,所述文本队列容纳k条文本,所述将所述语音信息转换为待识别文本之后,所述处理器还用于执行:The text intent recognition device according to claim 9, wherein the text queue contains k pieces of text, and after the voice information is converted into the text to be recognized, the processor is further configured to execute:
    当所述文本队列中的文本数量小于k时,将所述待识别文本加入所述文本队列,所述文本队列中的k条文本按照加入的时间顺序排列;When the number of texts in the text queue is less than k, adding the text to be recognized into the text queue, and the k pieces of text in the text queue are arranged in the order of adding time;
    当所述文本队列中的文本数量等于k时,删除所述文本队列中最先加入的文本,将所述待识别文本加入所述文本队列。When the number of texts in the text queue is equal to k, the text added first in the text queue is deleted, and the text to be recognized is added to the text queue.
  11. 根据权利要求9所述的文本意图识别设备,其中,执行所述对所述待识别文本和所述文本队列中的每条文本提取特征,得到所述待识别文本的文本特征和所述每条文本的文本特征,包括:The text intent recognition device according to claim 9, wherein the extraction of features of the text to be recognized and each text in the text queue is performed to obtain the text features of the text to be recognized and the text of each text. Text characteristics of the text, including:
    对所述待识别文本,提取单词层面的特征,得到所述待识别文本的单词层面特征,使用m个注意力模型提取句子层面的特征,得到所述待识别文本的句子层面特征,将所述待识别文本的单词层面特征和所述待识别文本的句子层面特征作为所述待识别文本的文本特征;其中,m是大于1的正整数;For the text to be recognized, the word-level features are extracted to obtain the word-level features of the text to be recognized, and m attention models are used to extract the sentence-level features to obtain the sentence-level features of the text to be recognized. The word-level features of the text to be recognized and the sentence-level features of the text to be recognized are used as the text features of the text to be recognized; where m is a positive integer greater than 1;
    对所述每条文本,提取单词层面的特征,得到所述每条文本的单词层面特征,使用m个注意力模型提取句子层面的特征,得到所述每条文本的句子层面特征,将所述每条文本的单词层面特征和所述每条文本的句子层面特征作为所述每条文本的文本特征。For each piece of text, extract the word-level features to obtain the word-level features of each text. Use m attention models to extract the sentence-level features to obtain the sentence-level features of each text. The word-level feature of each text and the sentence-level feature of each text are used as the text feature of each text.
  12. 根据权利要求11所述的文本意图识别设备,其中,执行所述对所述待识别文本, 提取单词层面的特征,得到所述待识别文本的单词层面特征,包括:The text intent recognition device according to claim 11, wherein performing the extraction of the word-level features of the to-be-recognized text to obtain the word-level features of the to-be-recognized text comprises:
    使用分词工具对所述待识别文本进行分词处理,得到n个词,将所述每条文本的所述n个词映射到词嵌入矩阵V中得到n个词向量,连接所述n个词向量得到所述每条文本的词向量矩阵,作为所述待识别文本的单词层面特征;Use a word segmentation tool to perform word segmentation processing on the text to be recognized to obtain n words, map the n words of each text to the word embedding matrix V to obtain n word vectors, and connect the n word vectors Obtain the word vector matrix of each text as the word-level feature of the text to be recognized;
    对所述每条文本,提取单词层面的特征,得到所述每条文本的单词层面特征,包括:For each piece of text, extract the word-level features to obtain the word-level features of each text, including:
    使用分词工具对所述每条文本进行分词处理,得到n个词,将所述每条文本中的所述n个词映射到词嵌入矩阵V中得到n个词向量,连接所述n个词向量得到所述每条文本的词向量矩阵,作为所述每条文本的单词层面特征。Use a word segmentation tool to perform word segmentation processing on each text to obtain n words, map the n words in each text to the word embedding matrix V to obtain n word vectors, and connect the n words The vector obtains the word vector matrix of each text as the word-level feature of each text.
  13. 根据权利要求11所述的的文本意图识别设备,其中,执行所述使用m个注意力模型提取句子层面的特征,得到所述待识别文本的句子层面特征,包括:The text intent recognition device according to claim 11, wherein executing the use of m attention models to extract sentence-level features to obtain sentence-level features of the text to be recognized comprises:
    对所述待识别文本的词向量矩阵,使用m个注意力模型对所述待识别文本的词向量矩阵进行处理,得到m个不同层面的特征,将所述m个不同层面的特征作为所述待识别文本的句子层面特征;其中,第i个注意力模型的输出作为第i+1个注意力模型的输入,所述m个注意力模型中的每个注意力模型输出一个层面的特征,i为大于等于1,小于m的正整数。For the word vector matrix of the text to be recognized, m attention models are used to process the word vector matrix of the text to be recognized to obtain m features at different levels, and use the m features at different levels as the Sentence-level features of the text to be recognized; among them, the output of the i-th attention model is used as the input of the i+1-th attention model, and each of the m attention models outputs a level feature, i is a positive integer greater than or equal to 1 and less than m.
  14. 根据权利要求11所述的文本意图识别设备,其中,执行所述根据所述待识别文本的文本特征与文本队列中每条文本的文本特征,得到所述待识别文本的融合特征,包括:The text intent recognition device according to claim 11, wherein the execution of obtaining the fusion feature of the text to be recognized based on the text feature of the text to be recognized and the text feature of each text in the text queue comprises:
    将所述待识别文本的文本特征和所述每条文本的文本特征,使用深度关注匹配算法DAM对所述待识别文本的单词层面的特征和所述每条文本的单词层面的特征在不同的粒度上进行匹配融合,得到第一融合特征;The text features of the text to be recognized and the text features of each piece of text, using the deep attention matching algorithm DAM, are used to compare the word-level features of the text to be recognized and the word-level features of each text in different Perform matching and fusion at granularity to obtain the first fusion feature;
    将所述待识别文本的文本特征和所述每条文本的文本特征,使用深度关注匹配算法DAM对所述待识别文本的句子层面的特征和所述每条文本的句子层面的特征在不同的粒度上进行匹配融合,得到第二融合特征;The text features of the text to be recognized and the text features of each piece of text, using the deep attention matching algorithm DAM, are used to compare the sentence-level features of the text to be recognized and the sentence-level features of each text in different Perform matching and fusion at the granularity to obtain the second fusion feature;
    将所述第一融合特征和所述第二融合特征进行融合,得到待识别文本的融合特征。The first fusion feature and the second fusion feature are fused to obtain the fusion feature of the text to be recognized.
  15. 一种计算机可读存储介质,包括指令,当所述指令在计算机上运行时,使得所述计算机执行以下方法:A computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the following method:
    获取语音信息和文本队列,将所述语音信息转换为待识别文本,所述文本队列包括一条或多条文本;Acquiring voice information and a text queue, and converting the voice information into text to be recognized, the text queue including one or more texts;
    对所述待识别文本和所述文本队列中的每条文本提取特征,得到所述待识别文本的文本特征和所述每条文本对应的文本特征;Extracting features from the text to be recognized and each text in the text queue, to obtain the text feature of the text to be recognized and the text feature corresponding to each text;
    根据所述待识别文本的文本特征与所述每条文本的文本特征,得到所述待识别文本的融合特征;Obtaining the fusion feature of the text to be recognized according to the text feature of the text to be recognized and the text feature of each piece of text;
    将所述融合特征通过意图分类模型进行意图分类,得到所述待识别文本对应的意图。The fusion feature is classified into intent through an intent classification model, and the intent corresponding to the text to be recognized is obtained.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述文本队列容纳k条文本,所述将所述语音信息转换为待识别文本之后,当所述指令在计算机上运行时还使所述计算机执行:The computer-readable storage medium according to claim 15, wherein the text queue contains k pieces of text, and after the voice information is converted into the text to be recognized, when the instruction is run on the computer, the Said computer execution:
    当所述文本队列中的文本数量小于k时,将所述待识别文本加入所述文本队列,所述文本队列中的k条文本按照加入的时间顺序排列;When the number of texts in the text queue is less than k, adding the text to be recognized into the text queue, and the k pieces of text in the text queue are arranged in the order of adding time;
    当所述文本队列中的文本数量等于k时,删除所述文本队列中最先加入的文本,将所述待识别文本加入所述文本队列。When the number of texts in the text queue is equal to k, the text added first in the text queue is deleted, and the text to be recognized is added to the text queue.
  17. 根据权利要求15所述的计算机可读存储介质,其中,执行所述对所述待识别文本和所述文本队列中的每条文本提取特征,得到所述待识别文本的文本特征和所述每条文本的文本特征,包括:The computer-readable storage medium according to claim 15, wherein the extraction of features of the text to be recognized and each text in the text queue is performed to obtain the text features of the text to be recognized and each text in the text queue. The text characteristics of the text, including:
    对所述待识别文本,提取单词层面的特征,得到所述待识别文本的单词层面特征,使用m个注意力模型提取句子层面的特征,得到所述待识别文本的句子层面特征,将所述待 识别文本的单词层面特征和所述待识别文本的句子层面特征作为所述待识别文本的文本特征;其中,m是大于1的正整数;For the text to be recognized, the word-level features are extracted to obtain the word-level features of the text to be recognized, and m attention models are used to extract the sentence-level features to obtain the sentence-level features of the text to be recognized. The word-level features of the text to be recognized and the sentence-level features of the text to be recognized are used as the text features of the text to be recognized; where m is a positive integer greater than 1;
    对所述每条文本,提取单词层面的特征,得到所述每条文本的单词层面特征,使用m个注意力模型提取句子层面的特征,得到所述每条文本的句子层面特征,将所述每条文本的单词层面特征和所述每条文本的句子层面特征作为所述每条文本的文本特征。For each piece of text, extract the word-level features to obtain the word-level features of each text. Use m attention models to extract the sentence-level features to obtain the sentence-level features of each text. The word-level feature of each text and the sentence-level feature of each text are used as the text feature of each text.
  18. 根据权利要求17所述的计算机可读存储介质,其中,执行所述对所述待识别文本,提取单词层面的特征,得到所述待识别文本的单词层面特征,包括:18. The computer-readable storage medium according to claim 17, wherein performing the extraction of word-level features of the text to be recognized to obtain the word-level features of the text to be recognized comprises:
    使用分词工具对所述待识别文本进行分词处理,得到n个词,将所述每条文本的所述n个词映射到词嵌入矩阵V中得到n个词向量,连接所述n个词向量得到所述每条文本的词向量矩阵,作为所述待识别文本的单词层面特征;Use a word segmentation tool to perform word segmentation processing on the text to be recognized to obtain n words, map the n words of each text to the word embedding matrix V to obtain n word vectors, and connect the n word vectors Obtain the word vector matrix of each text as the word-level feature of the text to be recognized;
    对所述每条文本,提取单词层面的特征,得到所述每条文本的单词层面特征,包括:For each piece of text, extract the word-level features to obtain the word-level features of each text, including:
    使用分词工具对所述每条文本进行分词处理,得到n个词,将所述每条文本中的所述n个词映射到词嵌入矩阵V中得到n个词向量,连接所述n个词向量得到所述每条文本的词向量矩阵,作为所述每条文本的单词层面特征。Use a word segmentation tool to perform word segmentation processing on each text to obtain n words, map the n words in each text to the word embedding matrix V to obtain n word vectors, and connect the n words The vector obtains the word vector matrix of each text as the word-level feature of each text.
  19. 根据权利要求17所述的的计算机可读存储介质,其中,执行所述使用m个注意力模型提取句子层面的特征,得到所述待识别文本的句子层面特征,包括:18. The computer-readable storage medium according to claim 17, wherein executing the use of m attention models to extract sentence-level features to obtain sentence-level features of the text to be recognized comprises:
    对所述待识别文本的词向量矩阵,使用m个注意力模型对所述待识别文本的词向量矩阵进行处理,得到m个不同层面的特征,将所述m个不同层面的特征作为所述待识别文本的句子层面特征;其中,第i个注意力模型的输出作为第i+1个注意力模型的输入,所述m个注意力模型中的每个注意力模型输出一个层面的特征,i为大于等于1,小于m的正整数。For the word vector matrix of the text to be recognized, m attention models are used to process the word vector matrix of the text to be recognized to obtain m features at different levels, and use the m features at different levels as the Sentence-level features of the text to be recognized; among them, the output of the i-th attention model is used as the input of the i+1-th attention model, and each of the m attention models outputs a level feature, i is a positive integer greater than or equal to 1 and less than m.
  20. 根据权利要求17所述的计算机可读存储介质,其中,执行所述根据所述待识别文本的文本特征与文本队列中每条文本的文本特征,得到所述待识别文本的融合特征,包括:18. The computer-readable storage medium according to claim 17, wherein executing the method of obtaining the fusion feature of the text to be recognized based on the text feature of the text to be recognized and the text feature of each text in the text queue comprises:
    将所述待识别文本的文本特征和所述每条文本的文本特征,使用深度关注匹配算法DAM对所述待识别文本的单词层面的特征和所述每条文本的单词层面的特征在不同的粒度上进行匹配融合,得到第一融合特征;The text features of the text to be recognized and the text features of each piece of text, using the deep attention matching algorithm DAM, are used to compare the word-level features of the text to be recognized and the word-level features of each text in different Perform matching and fusion at granularity to obtain the first fusion feature;
    将所述待识别文本的文本特征和所述每条文本的文本特征,使用深度关注匹配算法DAM对所述待识别文本的句子层面的特征和所述每条文本的句子层面的特征在不同的粒度上进行匹配融合,得到第二融合特征;The text features of the text to be recognized and the text features of each piece of text, using the deep attention matching algorithm DAM, are used to compare the sentence-level features of the text to be recognized and the sentence-level features of each text in different Perform matching and fusion at the granularity to obtain the second fusion feature;
    将所述第一融合特征和所述第二融合特征进行融合,得到待识别文本的融合特征。The first fusion feature and the second fusion feature are fused to obtain the fusion feature of the text to be recognized.
PCT/CN2021/083876 2020-11-20 2021-03-30 Text intent recognition method and apparatus, and related device WO2021204017A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011309413.XA CN112417855A (en) 2020-11-20 2020-11-20 Text intention recognition method and device and related equipment
CN202011309413.X 2020-11-20

Publications (1)

Publication Number Publication Date
WO2021204017A1 true WO2021204017A1 (en) 2021-10-14

Family

ID=74774313

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083876 WO2021204017A1 (en) 2020-11-20 2021-03-30 Text intent recognition method and apparatus, and related device

Country Status (2)

Country Link
CN (1) CN112417855A (en)
WO (1) WO2021204017A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947700A (en) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 Model determination method and device, electronic equipment and memory
CN114201970A (en) * 2021-11-23 2022-03-18 国家电网有限公司华东分部 Method and device for capturing power grid scheduling event detection based on semantic features
WO2023088280A1 (en) * 2021-11-19 2023-05-25 北京有竹居网络技术有限公司 Intention recognition method and apparatus, readable medium, and electronic device
CN117131902A (en) * 2023-10-26 2023-11-28 北京布局未来教育科技有限公司 Student intention recognition method based on intelligent teaching and computer equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417855A (en) * 2020-11-20 2021-02-26 平安科技(深圳)有限公司 Text intention recognition method and device and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376361A (en) * 2018-11-16 2019-02-22 北京九狐时代智能科技有限公司 A kind of intension recognizing method and device
CN110147445A (en) * 2019-04-09 2019-08-20 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and storage medium based on text classification
CN110309287A (en) * 2019-07-08 2019-10-08 北京邮电大学 The retrieval type of modeling dialog round information chats dialogue scoring method
CN111221944A (en) * 2020-01-13 2020-06-02 平安科技(深圳)有限公司 Text intention recognition method, device, equipment and storage medium
US20200242486A1 (en) * 2019-01-29 2020-07-30 Ricoh Company, Ltd. Method and apparatus for recognizing intention, and non-transitory computer-readable recording medium
CN112417855A (en) * 2020-11-20 2021-02-26 平安科技(深圳)有限公司 Text intention recognition method and device and related equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376361A (en) * 2018-11-16 2019-02-22 北京九狐时代智能科技有限公司 A kind of intension recognizing method and device
US20200242486A1 (en) * 2019-01-29 2020-07-30 Ricoh Company, Ltd. Method and apparatus for recognizing intention, and non-transitory computer-readable recording medium
CN110147445A (en) * 2019-04-09 2019-08-20 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and storage medium based on text classification
CN110309287A (en) * 2019-07-08 2019-10-08 北京邮电大学 The retrieval type of modeling dialog round information chats dialogue scoring method
CN111221944A (en) * 2020-01-13 2020-06-02 平安科技(深圳)有限公司 Text intention recognition method, device, equipment and storage medium
CN112417855A (en) * 2020-11-20 2021-02-26 平安科技(深圳)有限公司 Text intention recognition method and device and related equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947700A (en) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 Model determination method and device, electronic equipment and memory
WO2023088280A1 (en) * 2021-11-19 2023-05-25 北京有竹居网络技术有限公司 Intention recognition method and apparatus, readable medium, and electronic device
CN114201970A (en) * 2021-11-23 2022-03-18 国家电网有限公司华东分部 Method and device for capturing power grid scheduling event detection based on semantic features
CN117131902A (en) * 2023-10-26 2023-11-28 北京布局未来教育科技有限公司 Student intention recognition method based on intelligent teaching and computer equipment
CN117131902B (en) * 2023-10-26 2024-02-27 北京布局未来科技发展有限公司 Student intention recognition method based on intelligent teaching and computer equipment

Also Published As

Publication number Publication date
CN112417855A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
WO2021204017A1 (en) Text intent recognition method and apparatus, and related device
CN109101537B (en) Multi-turn dialogue data classification method and device based on deep learning and electronic equipment
WO2020244073A1 (en) Speech-based user classification method and device, computer apparatus, and storage medium
WO2020077895A1 (en) Signing intention determining method and apparatus, computer device, and storage medium
WO2021114840A1 (en) Scoring method and apparatus based on semantic analysis, terminal device, and storage medium
CN110263160B (en) Question classification method in computer question-answering system
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
WO2021190259A1 (en) Slot identification method and electronic device
WO2020233131A1 (en) Question-and-answer processing method and apparatus, computer device and storage medium
WO2021135455A1 (en) Semantic recall method, apparatus, computer device, and storage medium
WO2022252636A1 (en) Artificial intelligence-based answer generation method and apparatus, device, and storage medium
CN111401077A (en) Language model processing method and device and computer equipment
CN111967264B (en) Named entity identification method
CN111274797A (en) Intention recognition method, device and equipment for terminal and storage medium
CN110377733B (en) Text-based emotion recognition method, terminal equipment and medium
WO2023045605A1 (en) Data processing method and apparatus, computer device, and storage medium
CN114330354B (en) Event extraction method and device based on vocabulary enhancement and storage medium
CN116304748B (en) Text similarity calculation method, system, equipment and medium
CN112487827A (en) Question answering method, electronic equipment and storage device
WO2024098524A1 (en) Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium
WO2022022049A1 (en) Long difficult text sentence compression method and apparatus, computer device, and storage medium
CN111310462A (en) User attribute determination method, device, equipment and storage medium
CN112906368B (en) Industry text increment method, related device and computer program product
WO2021217866A1 (en) Method and apparatus for ai interview recognition, computer device and storage medium
Andriyanov Combining Text and Image Analysis Methods for Solving Multimodal Classification Problems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21784383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21784383

Country of ref document: EP

Kind code of ref document: A1