WO2020140612A1

WO2020140612A1 - Convolutional neural network-based intention recognition method, apparatus, device, and medium

Info

Publication number: WO2020140612A1
Application number: PCT/CN2019/117097
Authority: WO
Inventors: 王健宗; 程宁; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-04
Filing date: 2019-11-11
Publication date: 2020-07-09
Also published as: CN109829153A

Abstract

The present application discloses a convolutional neural network-based intention recognition method, an apparatus, a device and a medium, being applied to the field of deep learning technology, and being used for solving the problem of low accuracy of intention recognition. Said method provided in the present application comprises: acquiring a target text of which the intention is to be recognized; performing vectorization processing on the target text, to obtain a target vector; putting the target vector, as an input, into a pre-trained convolutional neural network, to obtain a target result vector outputted by the convolutional neural network, respective elements in the target result vector being respectively first probability values corresponding to respective preset user intentions, a first probability value characterizing the probability that the target text indicates the corresponding preset user intention; and determining the preset user intention having the highest first probability value as a target user intention corresponding to the target text.

Description

Intent recognition method, device, equipment and medium based on convolutional neural network

[0001] This application is based on the Chinese invention patent application with the application number 201910007860.0 filed on January 04, 2019, titled "Intent Recognition Method, Device, Equipment, and Media Based on Convolutional Neural Network", and claims its priority right.

Technical field

[0002] The present application relates to the field of deep learning technology, and in particular, to an intent recognition method, device, device, and medium based on a convolutional neural network.

[0003]

[0004] BACKGROUND

[0005] In the market, accurately grasping the user's intention through the user's words is very helpful to facilitate the transaction. For example, in the telemarketing scene, the recognition of the user’s speech intention is crucial to the success of the product. The speech is the external expression of the user’s inner thoughts, revealing the user’s true feelings and internal needs. Capturing the user's intention correctly can not only increase the success rate of marketing, increase corporate revenue and brand awareness, but also will not affect the user experience.

[0006] At present, most enterprises employ customer service personnel to communicate with users, and rely on the experience and knowledge of customer service personnel to determine the user's true intentions and facilitate transactions. However, there are gaps in experience and knowledge between different customer service personnel, coupled with the influence of human subjective factors, it is prone to misjudgment of the user's true intention, resulting in a low accuracy of intention recognition.

[0007]

SUMMARY OF THE INVENTION

[0009] Embodiments of the present application provide an intent recognition method, device, computer device, and storage medium based on a convolutional neural network to solve the problem of low accuracy of intent recognition.

[0010] An intention recognition method based on a convolutional neural network includes:

[0011] Obtaining the target text of the intention to be identified;

[0012] Vectorizing the target text to obtain a target vector;

[0013] The target vector is used as an input to a pre-trained convolutional neural network to obtain the convolutional god A target result vector output via the network, each element in the target result vector is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention ;

[0014] The preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.

[0015] An intention recognition device based on a convolutional neural network includes:

[0016] a target text acquisition module, configured to acquire target text of an intention to be identified;

[0017] a text vectorization module, configured to vectorize the target text to obtain a target vector;

[0018] The network recognition module is used to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is Is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;

[0019] an intention determining module, configured to determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.

[0020] A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implements the computer-readable instructions to achieve the above Steps of intent recognition method for convolutional neural networks.

[0021] One or more readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, so that the one or more processors execute the convolutional neural network-based Intent recognition method steps.

[0022] The details of one or more embodiments of the present application are set forth in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings, and claims.

[0023]

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only for the application In some embodiments, for those of ordinary skill in the art, without paying creative labor, other drawings may be obtained based on these drawings.

[0026] FIG. 1 is a schematic diagram of an application environment of an intent recognition method based on a convolutional neural network in an embodiment of the present application Figure;

[0027] FIG. 2 is a flowchart of an intent recognition method based on a convolutional neural network in an embodiment of the present application;

[0028] FIG. 3 is a schematic flowchart of an intent recognition method step 102 based on a convolutional neural network in an application scenario in an embodiment of the present application;

[0029] FIG. 4 is a schematic diagram of a process of preprocessing target text in an application scenario based on a convolutional neural network-based intent recognition method in an embodiment of the present application;

[0030] FIG. 5 is a schematic flowchart of pre-training a convolutional neural network in an application scenario based on an intent recognition method of a convolutional neural network in an embodiment of the present application;

[0031] FIG. 6 is a schematic structural diagram of an intention recognition device based on a convolutional neural network in an embodiment of the present application;

[0032] FIG. 7 is a schematic structural view of some modules of a pre-trained convolutional neural network in an intention recognition device based on a convolutional neural network in an embodiment of the present application;

[0033] FIG. 8 is a schematic structural diagram of a network identification module in an embodiment of the present application;

[0034] FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.

[0035]

DETAILED DESCRIPTION

[0037] The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. . Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without making creative work fall within the protection scope of this application.

[0038] The intent recognition method based on the convolutional neural network provided in the present application can be applied in an application environment as shown in FIG. 1, wherein the client communicates with the server through the network. Among them, the client may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented with an independent server or a server cluster composed of multiple servers.

[0039] In an embodiment, as shown in FIG. 2, a method for intent recognition based on a convolutional neural network is provided. The method is applied to the server in FIG. 1 as an example for illustration, including the following steps:

[0040] 101 Obtain the target text of the intention to be identified;

[0041] In this embodiment, the server may obtain the target text of the intention to be recognized according to actual use needs or application scene needs. For example, the server can communicate with the client, and the client provides The user inquires about the problem, the user inputs the voice question through the microphone of the client, and the client uploads the voice question to the server. The server obtains the text after transliterating the voice question, and the text is the target text of the intention to be recognized. Alternatively, the server can also perform the task of recognizing the user's intentions for a large number of utterance texts. A database collects a large amount of utterance texts in advance, and then transmits multiple utterance texts to the server through the network. The server needs to make intents for these utterance texts Recognize, so that these utterance texts are each target texts. It can be understood that the server can also obtain the target text of the intent to be identified in various ways, which will not be described in detail here. It can be considered that any text that requires the server to identify the intent can be used as the target text.

[0042] It should be noted that the text referred to in this embodiment generally refers to utterance text, that is, text content obtained by transliteration of words spoken by a person.

[0043] 102. Vectorize the target text to obtain a target vector;

[0044] After the target text is acquired, in order to facilitate subsequent recognition and learning of the convolutional neural network, the server needs to vectorize the target text, that is, express the text into a vector to represent the target vector. Specifically, the server may record the target text in the form of a data matrix. In the data matrix, each word in the target text is mapped to a row vector in the data matrix.

[0045] For ease of understanding, in a specific application scenario, as shown in FIG. 3, further, the step 102 may specifically include:

[0046] 201. A preset dictionary is used to convert each word in the target text into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;

[0047] 202. Combining the one-dimensional line vectors into a two-dimensional vector as the target vector according to the order of the words in the target text.

[0048] For the above step 201, the server pre-sets a dictionary, which records the one-to-one correspondence between each word and each one-dimensional row vector. For example, you can set "I" to correspond to "No. 1 line vector", "He" to correspond to "No. 2 line vector", "You" to correspond to "No. 3 line vector"..., by exhausting as much as possible All words are used to complete the dictionary, so that when the target text needs to be converted, the server can use a preset dictionary to convert each word in the target text into each one-dimensional line vector. For example, suppose that the target text is "I and you go to dinner". After querying the dictionary, it is learned that: "I" corresponds to "line number 1", "He" corresponds to "line number 2", "you" and "Line 3 Vector" corresponds, "Go" corresponds to "Line 4 Vector", "Dine" and "Line 5 Vector" Correspondingly, to obtain row vectors 1-5 respectively. Among them, the above-mentioned row vectors 1-5 refer to the row vectors numbered 1, 2, 3, 4, and 5. Specifically, each row vector should be a one-dimensional matrix containing multiple elements, such as [7, 51,423, 50,0] is a one-dimensional row vector. When setting the dictionary, this one-dimensional row vector can be defined as the k-th row vector, where k is greater than or equal to 1.

[0049] Preferably, the construction of the dictionary may be completed in an automatic setting manner, and the server may set the dictionary while using the dictionary, which may specifically be: when text needs to be converted into a one-dimensional line vector, the server may obtain the dictionary one by one The word in the text, and query whether the correspondence between the word and a one-dimensional line vector is recorded in the dictionary; if there is, the server obtains the one-dimensional line vector corresponding to the word; if not, it will The word is added to the dictionary, and an unassigned one-dimensional line vector is assigned to correspond to the word, and then the server obtains the one-dimensional line vector corresponding to the word; the server obtains all the words in the text After the corresponding one-dimensional line vector, you can perform the following step 202 to construct a two-dimensional vector. At the same time, the words in the text that were not previously assigned to the dictionary are also added to the dictionary to complete the dictionary. .

[0050] It should be noted that when setting the dictionary, the unallocated one-dimensional line vector can be manually set by the staff, or the existing word vector can be obtained from a third-party platform, for example, it can be loaded into Sina, Zhi The word vector provided by the website is used as the one-dimensional line vector required for setting the dictionary in this embodiment.

[0051] For the above step 202, the five one-dimensional row vectors “1, 2, 3, 4, 5” are sequentially formed into a two-dimensional matrix, that is, a two-dimensional vector, to obtain the target vector. Among them, X1-X5 respectively represent the above-mentioned row vectors 1-5.

[0052] Considering the diversity of users, the target text in step 102 may not meet the requirements in format or there is more interference information. Therefore, in this embodiment, it may be pre-processed before converting it into a target vector. Processing makes the target text more convenient for vector conversion in format and content, and subsequent recognition and analysis of convolutional neural networks. As shown in FIG. 4, further, before step 102, the method further includes:

[0053] 301. Delete specified text in the target text, where the specified text includes at least stop words or punctuation marks;

[0054] 302. Perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text. [0055] For the above step 301, the stop words mentioned here may refer to single Chinese characters with a particularly high frequency of use, such as Chinese characters such as "的", "了", etc., which have no actual language meaning. In addition, the designated text may also include punctuation Symbols, such as commas, periods, etc., these punctuation marks also have no actual language meaning. When performing step 301, the server can delete the specified text in the target text. For example, suppose that the specified text includes stop words and punctuation marks, and the target text includes the text "I am coming to work today." Among them, stop words such as "了" which have no practical meaning are deleted, and punctuation marks such as "." are deleted, so that the deleted text "I come to work today" is obtained.

[0056] For the above step 302, after deleting the specified text, the server can also perform word segmentation processing on the target text, to undertake the above text "I come to work today", the server can use a third-party word segmentation tool to segment the text and convert Four words for "I come to work today".

[0057] 103. Put the target vector as an input into a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, each element in the target result vector is a preset A first probability value corresponding to the user's intention, the first probability value represents the probability that the target text belongs to the corresponding preset user's intention;

[0058] After obtaining the target vector corresponding to the target text, the server may use the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, the Each element in the target result vector is a first probability value corresponding to each preset user intention, where the first probability value represents the probability that the target text belongs to the corresponding preset user intention. It can be understood that the target result vector generally contains multiple elements, each of which is a first probability value, and these first probability values correspond one-to-one with multiple preset user intentions, and indicate that the target text belongs to the corresponding The probability of preset user intent. It can be seen that, if the first probability value corresponding to a preset user intention is larger, it means that the probability that the target text belongs to the preset user intention is higher.

[0059] For ease of understanding, the training process of the convolutional neural network will be described in detail below. As shown in FIG. 5, further, the convolutional neural network is pre-trained through the following steps:

[0060] 401. Collect utterance texts belonging to the intention of each preset user, respectively;

[0061] 402. Perform vectorization processing on the collected utterance texts respectively to obtain a sample vector corresponding to each utterance text;

[0062] 403. For each preset user intention, mark values of the sample vectors corresponding to the preset user intention It is recorded as 1, and the tag value of other sample vectors is recorded as 0, and the other sample vectors refer to sample vectors other than the sample vector corresponding to the preset user intention;

[0063] 404. For each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector. The sample result vector is composed of various elements, and each of the elements is characterized The probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;

[0064] 405. For each preset user intention, the output of each sample result vector is used as an adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the corresponding sample result vectors obtained to correspond to each sample vector. Error between the marked values of

[0065] 406. If the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset training termination condition, it is determined that the convolutional neural network has been trained well.

[0066] For the above step 401, in this embodiment, for the actual application scenario, the staff may set in advance various preset user intentions that need to be trained on the server, for example, may include “agree to listen”, “refuse to buy”, “ Intentions such as “willing to wait”, for these preset user intentions, the staff also needs to collect the corresponding utterance texts in specific application scenarios, such as the utterance texts converted from the questions actually consulted by the user. When collecting spoken texts, the server can collect spoken texts belonging to the intention of each preset user through professional knowledge bases, network databases and other channels. It should be noted that the utterance text corresponding to each preset user's intention should reach a certain order of magnitude, and the number of utterance texts between each preset user's intention may have a certain gap, but it should not be too far apart to avoid affecting the convolution The training effect of neural network. For example, the utterance texts that can be collected are: the number of utterance texts corresponding to "agree to listen" is 1 million, the number of utterance texts corresponding to "refuse to buy" is 200,000, and the number of utterance texts corresponding to "willing to wait" It is 300,000.

[0067] For the above step 402, it can be understood that, before the utterance text is input to the convolutional neural network for training, the collected utterance text needs to be vectorized separately to obtain the corresponding Sample vectors, which convert text to vectors, make it easier to understand and train convolutional neural networks. It should be noted that, considering that there are many sources of collected utterance text, the format of utterance text is often not uniform, which is easy to cause interference to subsequent training. Therefore, the server can preprocess these utterance texts before vectorizing them, including stop words, punctuation, and word cutting . For example, assuming that a certain text is "I'm coming to work today.", the server can first delete the stop words such as "了" which have no practical meaning, and delete the punctuation marks such as ".", and then use the first The tripartite word segmentation tool divides the text into sentences and converts them into four words "I come to work today". After preprocessing, the server then vectorizes each word in the spoken text to obtain a line vector corresponding to each word in the spoken text. By performing each word in the spoken text Vectorization results in multiple line vectors, which constitute the sample vector (two-dimensional vector) corresponding to the text. Specifically, the sample vector may be recorded in the form of a data matrix.

[0068] For the above step 403, it can be understood that before training, the sample vector needs to be marked. In this embodiment, since training needs to be performed for multiple preset user intentions, different preset user intentions should be separately targeted. Make annotations. For example, assuming a total of three preset user intentions, namely "agree to listen", "refuse to buy" and "willing to wait", shell 1], for "agree to listen", each sample under "agree to listen" The label value of the vector is recorded as 1, the label value of each sample vector under "refuse to buy" and "willing to wait" is recorded as 0, and is used for subsequent training of the convolutional neural network for the "agree to listen"; For "Refused to Purchase", the label value of each sample vector under "Refused to Purchase" is recorded as 1, and the labeled value of each sample vector under "Agree to Listen" and "Waiting to Wait" is recorded as 0, and used for subsequent targeting The training of the convolutional neural network at the time of "refusal to purchase"; for the "willing to wait" and other preset user intentions, it will be treated in the same way, and will not be repeated here.

[0069] For the above step 404, during training, for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector. It can be understood that, for the preset user intent, the label value of each sample vector except the preset user intent is 1, and the other label values are all 0. After inputting a sample vector to the convolutional neural network, the convolution The neural network outputs a sample result vector composed of N elements, and these N elements respectively represent the probability that the utterance text corresponding to the sample vector belongs to N preset user intentions.

[0070] Further, the step 404 may specifically include: when each sample vector is input to the convolutional neural network training, randomly discard the feature map output by the convolution layer according to a preset first discard probability, and use each The convolution window of the preset size extracts the features on the feature map and performs the pooling operation, and then randomly discards the output vector obtained by the pooling operation according to the preset second discard probability, and discards the remaining output vector into the fully connected layer to obtain the sample Result vector. Among them, the first drop probability and the second drop probability may be based on actual The setting of the international usage can be set to 0.6, for example. It can be seen that the feature map discarding operation and the discarding operation of the output vector after pooling are added here. Although the training effect of each training on the convolutional neural network is weakened to a certain extent, it greatly speeds up the training speed of each time. This allows the convolutional neural network to quickly complete the training of a large number of samples in a short time. After training a large number of samples, it is more conducive to improving the training effect of the convolutional neural network as a whole and the recognition accuracy is higher.

[0071] For the above step 405, it can be understood that, in the process of training the convolutional neural network, the parameters of the convolutional neural network need to be adjusted. For example, the network structure of a convolutional neural network mainly includes a convolutional layer, a pooling layer, a random deactivation layer, a regularization layer, and a softmax layer. Each layer is provided with several parameters. In a sample training process, through adjustment These parameters can affect the output of the convolutional neural network. For example, suppose that for a preset user intention of "agree to listen", a sample vector under "agree to listen" is put into the convolutional neural network, and the output result is: [0.3, 0.2, 0.5], the result The values of the three elements in the sample represent the probability that the utterance text corresponding to the sample vector belongs to the three preset user intentions of "agree to listen", "refuse to buy", and "willing to wait", that is, the utterance text belongs to "agree to listen" The probability is 0.3; the probability that the utterance text belongs to "refuse to buy" is 0.2; the probability that the utterance text belongs to "willing to wait" is 0.5. It can be known from the label value of the sample vector that the utterance text belongs to "agree to listen", so the parameters of the convolutional neural network can be adjusted to make the output of the convolutional neural network as "1, 0, 0". , The most important of which is to make the value of the element corresponding to "agree to listen" in the output result as close to 1 as possible

[0072] For the above step 406, after the completion of the above steps 403-405 for each preset user intention, it can be determined whether the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset The training termination condition, if satisfied, means that the various parameters in the convolutional neural network have been adjusted in place, and it can be determined that the convolutional neural network has been trained; otherwise, if it does not meet, it means that the convolutional neural network needs to continue training . Wherein, the training termination condition may be preset according to actual usage, specifically, the training termination condition may be set as follows: if the error between each sample result vector and the label value corresponding to each sample vector is less than the specified The error value is considered to satisfy the preset training termination condition. Alternatively, it can be set as follows: using the utterance text in the verification set to perform the above steps 402-404, if the error between the sample result vector output by the convolutional neural network and the label value is within a certain range, it is considered that it meets the pre- Set the training termination conditions. Among them, the words in the verification set The collection of operative texts is similar to the above step 401. Specifically, after performing the foregoing step 401 to collect utterance texts of each preset user intention, a certain percentage of the collected utterance texts is divided into a training set, and the remaining utterance texts are divided Is the validation set. For example, you can randomly divide 80% of the collected utterance text as a sample of the training set of the subsequent training convolutional neural network, and divide the other 20% into the subsequent verification whether the training of the convolutional neural network is completed, that is, whether it meets the preset A sample of the validation set of training termination conditions.

[0073] In this embodiment, referring to the above description of step 404, it can be known that a random discard mechanism can be added during training to improve the training efficiency of the convolutional neural network. The difference is that when using the convolutional neural network, in order to ensure the recognition accuracy of the convolutional neural network, the random discard mechanism is not used. For ease of understanding, further, the step 103 may specifically include: after inputting the target vector into the convolutional neural network, for the feature map output by the convolutional layer in the convolutional neural network, use the The convolution window in the convolutional neural network extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector. It should be noted that the convolution window used here is the convolution window in the trained convolutional neural network. It can be seen that the size and number of the convolution window have been adjusted during training, and there is no need to care about the size here And the number.

[0074] 104. Determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.

[0075] It can be understood that after the server obtains the target result vector output by the convolutional neural network, each element in the target result vector is a first probability value corresponding to each preset user intention, and the first probability value It represents the probability that the target text belongs to the corresponding preset user intention, which means that the higher the first probability value, the higher the probability that the target text belongs to the preset user intention. Therefore, the server selects the preset user intent with the highest first probability value to determine as the target user intent corresponding to the target text, which grasps the actual situation and real intention of the user to the greatest extent.

[0076] In the embodiment of the present application, first, the target text of the intention to be recognized is obtained; then, the target text is vectorized to obtain a target vector; then, the target vector is input as an input to the pre-trained A convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is a first probability value corresponding to each preset user intention, The rate value represents the probability that the target text belongs to the corresponding preset user intention; finally, the preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text. It can be seen that this application can accurately identify the user's true intention from the target text through the pre-trained convolutional neural network, which not only avoids the recognition bias caused by the gap in experience and knowledge, but also eliminates the influence of human subjective factors. The accuracy of intent recognition is improved, which helps companies to grasp the true intentions of users and facilitate transactions.

[0077] It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, the execution order of each process should be determined by its function and inherent logic, and should not constitute the implementation process of the embodiments of the present application Any limitation.

[0078]

[0079] In an embodiment, an intent recognition device based on a convolutional neural network is provided. The intent recognition device based on a convolutional neural network corresponds to the intent recognition method based on the convolutional neural network in the above embodiment. As shown in FIG. 6, the intention recognition device based on a convolutional neural network includes a target text acquisition module 501, a text vectorization module 502, a network recognition module 503, and an intention determination module 504. The detailed description of each function module is as follows:

[0080] The target text acquisition module 501 is used to acquire the target text of the intention to be identified;

[0081] a text vectorization module 502, configured to vectorize the target text to obtain a target vector;

[0082] The network recognition module 503 is configured to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector Are first probability values corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;

[0083] The intention determining module 504 is configured to determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.

[0084] As shown in FIG. 7, further, the convolutional neural network may be pre-trained by the following modules:

[0085] The text collection module 505 is configured to separately collect utterance text belonging to the intention of each preset user;

[0086] The sample vectorization module 506 is used to vectorize the collected utterance text to obtain a sample vector corresponding to each utterance text;

[0087] The sample marking module 507 is configured to, for each preset user intent, correspond to the preset user intent The label value of the sample vector is recorded as 1, and the label value of the other sample vectors is recorded as 0. The other sample vector refers to a sample vector other than the sample vector corresponding to the preset user intention;

[0088] The sample input module 508 is configured to input all sample vectors as input to the convolutional neural network for training for each preset user intent to obtain a sample result vector, and the sample result vector is composed of various elements. Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;

[0089] The network parameter adjustment module 509 is configured to adjust the parameters of the convolutional neural network with each output sample result vector as the adjustment target for each preset user intent to minimize the obtained sample results The error between the vector and the label value corresponding to each sample vector;

[0090] The training completion determination module 510 is configured to determine that the convolutional neural network has been trained if the error between each sample result vector and the label value corresponding to each sample vector satisfies a preset training termination condition.

[0091] As shown in FIG. 8, further, the network identification module 503 may include:

[0092] The neural network recognition unit 5031 is configured to use the convolutional neural network for the feature map output by the convolutional layer in the convolutional neural network after inputting the target vector to the convolutional neural network The convolution window in extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector.

[0093] Further, the text vectorization module may include:

[0094] A one-dimensional vector conversion unit for converting each word in the target text into each one-dimensional line vector using a preset dictionary, the dictionary records the correspondence between the words and each one-dimensional line vector Relationship

[0095] The two-dimensional vector composition unit is configured to compose each one-dimensional line vector into a two-dimensional vector as the target vector according to the order of each word in the target text.

[0096] Further, the intention recognition device based on the convolutional neural network may further include:

[0097] a designated text deleting module, configured to delete the designated text in the target text, the designated text includes at least stop words or punctuation marks;

[0098] The word segmentation processing module is configured to perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.

[0099] [0100] For the specific definition of the intention recognition device based on the convolutional neural network, reference may be made to the above definition of the intention recognition method based on the convolutional neural network, which will not be repeated here. Each module in the above intention recognition device based on a convolutional neural network may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

[0101] In an embodiment, a computer device is provided, the computer device may be a server, and an internal structure diagram thereof may be as shown in FIG. 9. The computer equipment includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a readable storage medium and internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for operating systems and computer-readable instructions in a readable storage medium. The database of the computer device is used to store the data involved in the intent recognition method based on the convolutional neural network. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer readable instructions are executed by the processor to implement a method of intent recognition based on convolutional neural networks. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

[0102] In an embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor implements the computer-readable instructions to implement the foregoing implementation In the example, the steps of the intent recognition method based on the convolutional neural network are, for example, steps 101 to 104 shown in FIG. 2. Alternatively, when the processor executes computer-readable instructions, the functions of the modules/units of the intention recognition device based on the convolutional neural network in the above embodiments are implemented, for example, the functions of module 501 to module 504 shown in FIG. 6. In order to avoid repetition, I will not repeat them here.

[0103] In one embodiment, a computer-readable storage medium is provided, the one or more readable storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, Enabling one or more processors to execute the steps of the computer-readable instructions to implement the steps of the intent recognition method based on the convolutional neural network in the foregoing method embodiments, or the one or more readable storage media storing computer-readable instructions, When the computer-readable instructions are executed by one or more processors, when one or more processors execute the computer-readable instructions, the intention based on the convolutional neural network in the foregoing device embodiments is realized Identify the function of each module/unit in the device. In order to avoid repetition, I will not repeat them here. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

[0104] A person of ordinary skill in the art may understand that all or part of the process in the method of the foregoing embodiments may be completed by instructing relevant hardware through computer-readable instructions. The computer-readable instructions may be stored in a computer. In reading the storage medium, when the computer-readable instructions are executed, the processes of the foregoing method embodiments may be included. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. The memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRA M), Synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0105] Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the above-mentioned division of each functional unit and module is used as an example for illustration. In practical applications, the above-mentioned functions may be allocated by different The functional units and modules are completed, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

[0106] The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate from the spirit and scope of the technical solutions of the embodiments of the present application. It should be included in the scope of protection of this application.

Claims

[Claim 1] An intention recognition method based on a convolutional neural network, characterized in that it includes:

Obtain the target text of the intention to be identified;

Vectorizing the target text to obtain a target vector; using the target vector as input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, the target result vector Each element in is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;

The preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.

[Claim 2] The intent recognition method based on a convolutional neural network according to claim 1, characterized in that the convolutional neural network is pre-trained by the following steps: Collecting utterance texts belonging to the intention of each preset user respectively ;

Vectorize the collected utterance text separately to obtain the sample vector corresponding to each utterance text;

For each preset user intent, the label value of the sample vector corresponding to the preset user intent is recorded as 1, and the label value of the other sample vectors is recorded as 0. The other sample vector refers to the preset user intent A sample vector other than the corresponding sample vector; for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector, and the sample result vector is composed of various elements, the Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;

For each preset user's intention, the output of each sample result vector is used as the adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the difference between the obtained sample result vector and the corresponding tag value of each sample vector Error; if the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset training termination condition, it is determined that the convolutional neural network has been trained well.

[Claim 3] The intent recognition method based on convolutional neural network according to claim 1, characterized in that Therefore, the inputting the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network includes:

After inputting the target vector to the convolutional neural network, for the feature map output by the convolutional layer in the convolutional neural network, the convolutional window in the convolutional neural network is used to extract the feature map. And the pooling operation, and then input the output vector obtained from the pooling operation to the fully connected layer in the convolutional neural network to obtain the target result vector.

[Claim 4] The intent recognition method based on convolutional neural network according to claim 1, characterized in that the vectorizing the target text to obtain the target vector includes: Each word in the target text is converted into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;

According to the order of each word in the target text, the one-dimensional line vectors are combined into a two-dimensional vector as the target vector.

[Claim 5] The intent recognition method based on a convolutional neural network according to any one of claims 1 to 4, characterized in that before vectorizing the target text to obtain a target vector, the method further includes :

Delete the specified text in the target text, the specified text includes at least stop words or punctuation marks;

Perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.

[Claim 6] An intention recognition device based on a convolutional neural network, characterized in that it includes:

A target text acquisition module, used to acquire the target text of the intention to be identified;

A text vectorization module, which is used to vectorize the target text to obtain a target vector;

A network recognition module, used to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is a It is assumed that the first probability value corresponding to the user's intention represents the probability that the target text belongs to the corresponding preset user's intention; the intention determination module is used to determine the preset user's intention with the highest first probability value as the Description Target user intent corresponding to the target text.

[Claim 7] The intent recognition device based on the convolutional neural network according to claim 6, characterized in that the convolutional neural network is pre-trained by the following modules: a text collection module for collecting separately Suppose the utterance text intended by the user; the sample vectorization module is used to vectorize the collected utterance text separately to obtain the sample vector corresponding to each utterance text;

The sample labeling module is configured to record the label value of the sample vector corresponding to the preset user intention as 1 and the label value of other sample vectors as 0 for each preset user intention, and the other sample vector refers to A sample vector sample input module other than the sample vector corresponding to the preset user intent is used to input all sample vectors as input to the convolutional neural network for training for each preset user intent to obtain a sample result vector. The sample result vector is composed of various elements, and each element represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;

The network parameter adjustment module is used to adjust the parameters of the convolutional neural network with the output of each sample result vector as the adjustment target for each preset user intent to minimize the obtained sample result vector and each sample An error training completion determination module between the marker values corresponding to the vector is used to determine the convolutional nerve if the error between the respective sample result vector and the marker value corresponding to each sample vector satisfies the preset training termination condition The network has been trained.

[Claim 8] The intention recognition device based on the convolutional neural network according to claim 6, wherein the network recognition module includes:

The neural network recognition unit is configured to use the convolutions in the convolutional neural network for the feature maps output by the convolutional layer in the convolutional neural network after inputting the target vector into the convolutional neural network The window extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector.

[Claim 9] The intention recognition device based on the convolutional neural network according to claim 6, characterized in that the text vectorization module includes:

A one-dimensional vector conversion unit, used to convert each word in the target text into each one-dimensional line vector using a preset dictionary, and the dictionary records the correspondence between the words and each one-dimensional line vector;

The two-dimensional vector composition unit is configured to form each one-dimensional line vector into a two-dimensional vector as the target vector according to the order of each word in the target text.

[Claim 10] The intention recognition device based on a convolutional neural network according to any one of claims 6 to 9, wherein the intention recognition device based on a convolutional neural network further includes: a designated text deletion module For deleting the specified text in the target text, where the specified text includes at least stop words or punctuation marks;

The word segmentation processing module is configured to perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.

[Claim 11] A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that the processor executes the computer The following steps are realized when reading instructions:

Obtain the target text of the intention to be identified;

[Claim 12] The computer device according to claim 11, wherein the convolutional neural network is pre-trained by the following steps:

Collect the utterance text that belongs to the intention of each preset user separately;

Vectorize the collected utterance text separately to obtain the corresponding Sample vector

[Claim 13] The computer device according to claim 11, wherein the target vector is input as an input to a pre-trained convolutional neural network to obtain a target result output by the convolutional neural network The vector includes:

[Claim 14] The computer device according to claim 11, wherein the vectorizing the target text to obtain a target vector includes:

A preset dictionary is used to convert each word in the target text into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;

[Claim 15] The computer device according to any one of claims 11 to 14, characterized in that, before vectorizing the target text to obtain a target vector, the processor executes The computer-readable instructions also implement the following steps:

[Claim 16] One or more readable storage media storing computer-readable instructions, characterized in that, when the computer-readable instructions are executed by one or more processors, the one or more processes The controller performs the following steps:

Obtain the target text of the intention to be identified;

[Claim 17] The readable storage medium according to claim 16, wherein the convolutional neural network is pre-trained by the following steps:

For each preset user intent, the label value of the sample vector corresponding to the preset user intent is recorded as 1, and the label value of the other sample vectors is recorded as 0. The other sample vector refers to the preset user intent A sample vector other than the corresponding sample vector; for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector, and the sample result vector is composed of various elements, the Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user; For each preset user's intention, the output of each sample result vector is used as the adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the difference between the obtained sample result vector and the corresponding tag value of each sample vector Error; if the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset training termination condition, it is determined that the convolutional neural network has been trained well.

[Claim 18] The readable storage medium according to claim 16, wherein the target vector is input as an input to a pre-trained convolutional neural network to obtain the output of the convolutional neural network Target result vectors include:

[Claim 19] The readable storage medium according to claim 16, wherein the vectorizing the target text to obtain a target vector includes:

[Claim 20] The readable storage medium according to any one of claims 16 to 19, characterized in that, before vectorizing the target text to obtain a target vector, the computer-readable instruction is When executed by one or more processors, the one or more processors further execute the following steps: