WO2020140612A1 - Convolutional neural network-based intention recognition method, apparatus, device, and medium - Google Patents

Convolutional neural network-based intention recognition method, apparatus, device, and medium Download PDF

Info

Publication number
WO2020140612A1
WO2020140612A1 PCT/CN2019/117097 CN2019117097W WO2020140612A1 WO 2020140612 A1 WO2020140612 A1 WO 2020140612A1 CN 2019117097 W CN2019117097 W CN 2019117097W WO 2020140612 A1 WO2020140612 A1 WO 2020140612A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
target
neural network
convolutional neural
text
Prior art date
Application number
PCT/CN2019/117097
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
程宁
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020140612A1 publication Critical patent/WO2020140612A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present application relates to the field of deep learning technology, and in particular, to an intent recognition method, device, device, and medium based on a convolutional neural network.
  • Embodiments of the present application provide an intent recognition method, device, computer device, and storage medium based on a convolutional neural network to solve the problem of low accuracy of intent recognition.
  • An intention recognition method based on a convolutional neural network includes:
  • the target vector is used as an input to a pre-trained convolutional neural network to obtain the convolutional god
  • a target result vector output via the network each element in the target result vector is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention ;
  • the preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.
  • An intention recognition device based on a convolutional neural network includes:
  • a target text acquisition module configured to acquire target text of an intention to be identified
  • a text vectorization module configured to vectorize the target text to obtain a target vector
  • the network recognition module is used to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is Is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;
  • an intention determining module configured to determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implements the computer-readable instructions to achieve the above Steps of intent recognition method for convolutional neural networks.
  • One or more readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, so that the one or more processors execute the convolutional neural network-based Intent recognition method steps.
  • FIG. 1 is a schematic diagram of an application environment of an intent recognition method based on a convolutional neural network in an embodiment of the present application Figure
  • FIG. 2 is a flowchart of an intent recognition method based on a convolutional neural network in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of an intent recognition method step 102 based on a convolutional neural network in an application scenario in an embodiment of the present application;
  • FIG. 4 is a schematic diagram of a process of preprocessing target text in an application scenario based on a convolutional neural network-based intent recognition method in an embodiment of the present application;
  • FIG. 5 is a schematic flowchart of pre-training a convolutional neural network in an application scenario based on an intent recognition method of a convolutional neural network in an embodiment of the present application;
  • FIG. 6 is a schematic structural diagram of an intention recognition device based on a convolutional neural network in an embodiment of the present application
  • FIG. 7 is a schematic structural view of some modules of a pre-trained convolutional neural network in an intention recognition device based on a convolutional neural network in an embodiment of the present application;
  • FIG. 8 is a schematic structural diagram of a network identification module in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the intent recognition method based on the convolutional neural network provided in the present application can be applied in an application environment as shown in FIG. 1, wherein the client communicates with the server through the network.
  • the client may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented with an independent server or a server cluster composed of multiple servers.
  • a method for intent recognition based on a convolutional neural network is provided.
  • the method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
  • the server may obtain the target text of the intention to be recognized according to actual use needs or application scene needs.
  • the server can communicate with the client, and the client provides The user inquires about the problem, the user inputs the voice question through the microphone of the client, and the client uploads the voice question to the server.
  • the server obtains the text after transliterating the voice question, and the text is the target text of the intention to be recognized.
  • the server can also perform the task of recognizing the user's intentions for a large number of utterance texts.
  • a database collects a large amount of utterance texts in advance, and then transmits multiple utterance texts to the server through the network.
  • the server needs to make intents for these utterance texts Recognize, so that these utterance texts are each target texts. It can be understood that the server can also obtain the target text of the intent to be identified in various ways, which will not be described in detail here. It can be considered that any text that requires the server to identify the intent can be used as the target text.
  • the text referred to in this embodiment generally refers to utterance text, that is, text content obtained by transliteration of words spoken by a person.
  • the server needs to vectorize the target text, that is, express the text into a vector to represent the target vector.
  • the server may record the target text in the form of a data matrix. In the data matrix, each word in the target text is mapped to a row vector in the data matrix.
  • the step 102 may specifically include:
  • a preset dictionary is used to convert each word in the target text into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;
  • the server pre-sets a dictionary, which records the one-to-one correspondence between each word and each one-dimensional row vector. For example, you can set “I” to correspond to “No. 1 line vector”, “He” to correspond to “No. 2 line vector”, “You” to correspond to "No. 3 line vector”..., by exhausting as much as possible All words are used to complete the dictionary, so that when the target text needs to be converted, the server can use a preset dictionary to convert each word in the target text into each one-dimensional line vector. For example, suppose that the target text is "I and you go to dinner”.
  • row vectors 1-5 refer to the row vectors numbered 1, 2, 3, 4, and 5.
  • each row vector should be a one-dimensional matrix containing multiple elements, such as [7, 51,423, 50,0] is a one-dimensional row vector.
  • this one-dimensional row vector can be defined as the k-th row vector, where k is greater than or equal to 1.
  • the construction of the dictionary may be completed in an automatic setting manner, and the server may set the dictionary while using the dictionary, which may specifically be: when text needs to be converted into a one-dimensional line vector, the server may obtain the dictionary one by one The word in the text, and query whether the correspondence between the word and a one-dimensional line vector is recorded in the dictionary; if there is, the server obtains the one-dimensional line vector corresponding to the word; if not, it will The word is added to the dictionary, and an unassigned one-dimensional line vector is assigned to correspond to the word, and then the server obtains the one-dimensional line vector corresponding to the word; the server obtains all the words in the text After the corresponding one-dimensional line vector, you can perform the following step 202 to construct a two-dimensional vector. At the same time, the words in the text that were not previously assigned to the dictionary are also added to the dictionary to complete the dictionary. .
  • the unallocated one-dimensional line vector can be manually set by the staff, or the existing word vector can be obtained from a third-party platform, for example, it can be loaded into Sina, Zhi
  • the word vector provided by the website is used as the one-dimensional line vector required for setting the dictionary in this embodiment.
  • the five one-dimensional row vectors “1, 2, 3, 4, 5” are sequentially formed into a two-dimensional matrix, that is, a two-dimensional vector, to obtain the target vector.
  • X1-X5 respectively represent the above-mentioned row vectors 1-5.
  • the target text in step 102 may not meet the requirements in format or there is more interference information. Therefore, in this embodiment, it may be pre-processed before converting it into a target vector. Processing makes the target text more convenient for vector conversion in format and content, and subsequent recognition and analysis of convolutional neural networks. As shown in FIG. 4, further, before step 102, the method further includes:
  • the stop words mentioned here may refer to single Chinese characters with a particularly high frequency of use, such as Chinese characters such as " ⁇ ", " ⁇ ", etc., which have no actual language meaning.
  • the designated text may also include punctuation Symbols, such as commas, periods, etc., these punctuation marks also have no actual language meaning.
  • the server can delete the specified text in the target text.
  • the specified text includes stop words and punctuation marks
  • the target text includes the text "I am coming to work today.”
  • stop words such as " ⁇ ” which have no practical meaning are deleted
  • punctuation marks such as ".” are deleted, so that the deleted text "I come to work today” is obtained.
  • the server can also perform word segmentation processing on the target text, to undertake the above text "I come to work today", the server can use a third-party word segmentation tool to segment the text and convert Four words for "I come to work today”.
  • each element in the target result vector is a preset
  • a first probability value corresponding to the user's intention represents the probability that the target text belongs to the corresponding preset user's intention
  • the server may use the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, the Each element in the target result vector is a first probability value corresponding to each preset user intention, where the first probability value represents the probability that the target text belongs to the corresponding preset user intention.
  • the target result vector generally contains multiple elements, each of which is a first probability value, and these first probability values correspond one-to-one with multiple preset user intentions, and indicate that the target text belongs to the corresponding The probability of preset user intent. It can be seen that, if the first probability value corresponding to a preset user intention is larger, it means that the probability that the target text belongs to the preset user intention is higher.
  • the training process of the convolutional neural network will be described in detail below. As shown in FIG. 5, further, the convolutional neural network is pre-trained through the following steps:
  • sample result vector is composed of various elements, and each of the elements is characterized The probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;
  • each sample result vector is used as an adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the corresponding sample result vectors obtained to correspond to each sample vector. Error between the marked values of
  • the staff may set in advance various preset user intentions that need to be trained on the server, for example, may include “agree to listen”, “refuse to buy”, “ Intentions such as “willing to wait”, for these preset user intentions, the staff also needs to collect the corresponding utterance texts in specific application scenarios, such as the utterance texts converted from the questions actually consulted by the user.
  • the server can collect spoken texts belonging to the intention of each preset user through professional knowledge bases, network databases and other channels.
  • the utterance text corresponding to each preset user's intention should reach a certain order of magnitude, and the number of utterance texts between each preset user's intention may have a certain gap, but it should not be too far apart to avoid affecting the convolution
  • the training effect of neural network For example, the utterance texts that can be collected are: the number of utterance texts corresponding to "agree to listen" is 1 million, the number of utterance texts corresponding to "refuse to buy" is 200,000, and the number of utterance texts corresponding to "willing to wait" It is 300,000.
  • step 402 it can be understood that, before the utterance text is input to the convolutional neural network for training, the collected utterance text needs to be vectorized separately to obtain the corresponding Sample vectors, which convert text to vectors, make it easier to understand and train convolutional neural networks. It should be noted that, considering that there are many sources of collected utterance text, the format of utterance text is often not uniform, which is easy to cause interference to subsequent training. Therefore, the server can preprocess these utterance texts before vectorizing them, including stop words, punctuation, and word cutting .
  • the server can first delete the stop words such as " ⁇ ” which have no practical meaning, and delete the punctuation marks such as ".”, and then use the first
  • the tripartite word segmentation tool divides the text into sentences and converts them into four words "I come to work today”.
  • the server then vectorizes each word in the spoken text to obtain a line vector corresponding to each word in the spoken text.
  • the server By performing each word in the spoken text Vectorization results in multiple line vectors, which constitute the sample vector (two-dimensional vector) corresponding to the text.
  • the sample vector may be recorded in the form of a data matrix.
  • step 404 during training, for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector.
  • the label value of each sample vector except the preset user intent is 1, and the other label values are all 0.
  • the convolution The neural network outputs a sample result vector composed of N elements, and these N elements respectively represent the probability that the utterance text corresponding to the sample vector belongs to N preset user intentions.
  • the step 404 may specifically include: when each sample vector is input to the convolutional neural network training, randomly discard the feature map output by the convolution layer according to a preset first discard probability, and use each The convolution window of the preset size extracts the features on the feature map and performs the pooling operation, and then randomly discards the output vector obtained by the pooling operation according to the preset second discard probability, and discards the remaining output vector into the fully connected layer to obtain the sample Result vector.
  • the first drop probability and the second drop probability may be based on actual The setting of the international usage can be set to 0.6, for example. It can be seen that the feature map discarding operation and the discarding operation of the output vector after pooling are added here.
  • each training on the convolutional neural network is weakened to a certain extent, it greatly speeds up the training speed of each time. This allows the convolutional neural network to quickly complete the training of a large number of samples in a short time. After training a large number of samples, it is more conducive to improving the training effect of the convolutional neural network as a whole and the recognition accuracy is higher.
  • the parameters of the convolutional neural network need to be adjusted.
  • the network structure of a convolutional neural network mainly includes a convolutional layer, a pooling layer, a random deactivation layer, a regularization layer, and a softmax layer. Each layer is provided with several parameters. In a sample training process, through adjustment These parameters can affect the output of the convolutional neural network.
  • the output result is: [0.3, 0.2, 0.5]
  • the values of the three elements in the sample represent the probability that the utterance text corresponding to the sample vector belongs to the three preset user intentions of "agree to listen", “refuse to buy", and “willing to wait", that is, the utterance text belongs to "agree to listen”
  • the probability is 0.3; the probability that the utterance text belongs to "refuse to buy” is 0.2; the probability that the utterance text belongs to "willing to wait” is 0.5.
  • the training termination condition means that the various parameters in the convolutional neural network have been adjusted in place, and it can be determined that the convolutional neural network has been trained; otherwise, if it does not meet, it means that the convolutional neural network needs to continue training .
  • the training termination condition may be preset according to actual usage, specifically, the training termination condition may be set as follows: if the error between each sample result vector and the label value corresponding to each sample vector is less than the specified The error value is considered to satisfy the preset training termination condition.
  • the utterance text in the verification set can be set as follows: using the utterance text in the verification set to perform the above steps 402-404, if the error between the sample result vector output by the convolutional neural network and the label value is within a certain range, it is considered that it meets the pre- Set the training termination conditions.
  • the words in the verification set The collection of operative texts is similar to the above step 401. Specifically, after performing the foregoing step 401 to collect utterance texts of each preset user intention, a certain percentage of the collected utterance texts is divided into a training set, and the remaining utterance texts are divided Is the validation set.
  • step 103 may specifically include: after inputting the target vector into the convolutional neural network, for the feature map output by the convolutional layer in the convolutional neural network, use the The convolution window in the convolutional neural network extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector.
  • the convolution window used here is the convolution window in the trained convolutional neural network. It can be seen that the size and number of the convolution window have been adjusted during training, and there is no need to care about the size here And the number.
  • [0074] 104 Determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.
  • each element in the target result vector is a first probability value corresponding to each preset user intention, and the first probability value It represents the probability that the target text belongs to the corresponding preset user intention, which means that the higher the first probability value, the higher the probability that the target text belongs to the preset user intention. Therefore, the server selects the preset user intent with the highest first probability value to determine as the target user intent corresponding to the target text, which grasps the actual situation and real intention of the user to the greatest extent.
  • the target text of the intention to be recognized is obtained; then, the target text is vectorized to obtain a target vector; then, the target vector is input as an input to the pre-trained A convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is a first probability value corresponding to each preset user intention,
  • the rate value represents the probability that the target text belongs to the corresponding preset user intention; finally, the preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.
  • this application can accurately identify the user's true intention from the target text through the pre-trained convolutional neural network, which not only avoids the recognition bias caused by the gap in experience and knowledge, but also eliminates the influence of human subjective factors.
  • the accuracy of intent recognition is improved, which helps companies to grasp the true intentions of users and facilitate transactions.
  • an intent recognition device based on a convolutional neural network corresponds to the intent recognition method based on the convolutional neural network in the above embodiment.
  • the intention recognition device based on a convolutional neural network includes a target text acquisition module 501, a text vectorization module 502, a network recognition module 503, and an intention determination module 504. The detailed description of each function module is as follows:
  • the target text acquisition module 501 is used to acquire the target text of the intention to be identified
  • a text vectorization module 502 configured to vectorize the target text to obtain a target vector
  • the network recognition module 503 is configured to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector Are first probability values corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;
  • the intention determining module 504 is configured to determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.
  • the convolutional neural network may be pre-trained by the following modules:
  • the text collection module 505 is configured to separately collect utterance text belonging to the intention of each preset user;
  • the sample vectorization module 506 is used to vectorize the collected utterance text to obtain a sample vector corresponding to each utterance text;
  • the sample marking module 507 is configured to, for each preset user intent, correspond to the preset user intent
  • the label value of the sample vector is recorded as 1, and the label value of the other sample vectors is recorded as 0.
  • the other sample vector refers to a sample vector other than the sample vector corresponding to the preset user intention
  • the sample input module 508 is configured to input all sample vectors as input to the convolutional neural network for training for each preset user intent to obtain a sample result vector, and the sample result vector is composed of various elements. Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;
  • the network parameter adjustment module 509 is configured to adjust the parameters of the convolutional neural network with each output sample result vector as the adjustment target for each preset user intent to minimize the obtained sample results The error between the vector and the label value corresponding to each sample vector;
  • the training completion determination module 510 is configured to determine that the convolutional neural network has been trained if the error between each sample result vector and the label value corresponding to each sample vector satisfies a preset training termination condition.
  • the network identification module 503 may include:
  • the neural network recognition unit 5031 is configured to use the convolutional neural network for the feature map output by the convolutional layer in the convolutional neural network after inputting the target vector to the convolutional neural network
  • the convolution window in extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector.
  • the text vectorization module may include:
  • a one-dimensional vector conversion unit for converting each word in the target text into each one-dimensional line vector using a preset dictionary, the dictionary records the correspondence between the words and each one-dimensional line vector Relationship
  • the two-dimensional vector composition unit is configured to compose each one-dimensional line vector into a two-dimensional vector as the target vector according to the order of each word in the target text.
  • the intention recognition device based on the convolutional neural network may further include:
  • a designated text deleting module configured to delete the designated text in the target text, the designated text includes at least stop words or punctuation marks;
  • the word segmentation processing module is configured to perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.
  • each module in the above intention recognition device based on a convolutional neural network may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device may be a server, and an internal structure diagram thereof may be as shown in FIG. 9.
  • the computer equipment includes a processor, memory, network interface, and database connected by a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a readable storage medium and internal memory.
  • the readable storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for operating systems and computer-readable instructions in a readable storage medium.
  • the database of the computer device is used to store the data involved in the intent recognition method based on the convolutional neural network.
  • the network interface of the computer device is used to communicate with external terminals through a network connection.
  • the computer readable instructions are executed by the processor to implement a method of intent recognition based on convolutional neural networks.
  • the readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor implements the computer-readable instructions to implement the foregoing implementation
  • the steps of the intent recognition method based on the convolutional neural network are, for example, steps 101 to 104 shown in FIG. 2.
  • the processor executes computer-readable instructions
  • the functions of the modules/units of the intention recognition device based on the convolutional neural network in the above embodiments are implemented, for example, the functions of module 501 to module 504 shown in FIG. 6. In order to avoid repetition, I will not repeat them here.
  • a computer-readable storage medium is provided, the one or more readable storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, Enabling one or more processors to execute the steps of the computer-readable instructions to implement the steps of the intent recognition method based on the convolutional neural network in the foregoing method embodiments, or the one or more readable storage media storing computer-readable instructions,
  • the intention based on the convolutional neural network in the foregoing device embodiments is realized Identify the function of each module/unit in the device. In order to avoid repetition, I will not repeat them here.
  • the readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory.
  • the memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRA M), Synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • EDRA M enhanced SDRAM
  • SLDRAM Synchronous link
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The present application discloses a convolutional neural network-based intention recognition method, an apparatus, a device and a medium, being applied to the field of deep learning technology, and being used for solving the problem of low accuracy of intention recognition. Said method provided in the present application comprises: acquiring a target text of which the intention is to be recognized; performing vectorization processing on the target text, to obtain a target vector; putting the target vector, as an input, into a pre-trained convolutional neural network, to obtain a target result vector outputted by the convolutional neural network, respective elements in the target result vector being respectively first probability values corresponding to respective preset user intentions, a first probability value characterizing the probability that the target text indicates the corresponding preset user intention; and determining the preset user intention having the highest first probability value as a target user intention corresponding to the target text.

Description

基于卷积神经网络的意图识别方法、 装置、 设备及介质 Intent recognition method, device, equipment and medium based on convolutional neural network
[0001] 本申请以 2019年 01月 04日提交的申请号为 201910007860.0, 名称为“基于卷积神 经网络的意图识别方法、 装置、 设备及介质”的中国发明专利申请为基础, 并要 求其优先权。 [0001] This application is based on the Chinese invention patent application with the application number 201910007860.0 filed on January 04, 2019, titled "Intent Recognition Method, Device, Equipment, and Media Based on Convolutional Neural Network", and claims its priority right.
技术领域 Technical field
[0002] 本申请涉及深度学习技术领域, 尤其涉及基于卷积神经网络的意图识别方法、 装置、 设备及介质。 [0002] The present application relates to the field of deep learning technology, and in particular, to an intent recognition method, device, device, and medium based on a convolutional neural network.
[0003] [0003]
[0004] 背景技术 [0004] BACKGROUND
[0005] 在市场上, 通过用户的话术准确把握用户意图对促成交易帮助很大。 比如, 在 电话推销场景中对用户话术的意图识别对于是否能成功推销产品至关重要, 话 术作为用户内心想法的外在表现, 透漏出用户的真实感受和内在需求, 如果能 通过话术正确捕捉到用户的意图, 既能提高推销的成功率, 增加企业营收和提 升品牌的知名度, 也不会影响用户的体验。 [0005] In the market, accurately grasping the user's intention through the user's words is very helpful to facilitate the transaction. For example, in the telemarketing scene, the recognition of the user’s speech intention is crucial to the success of the product. The speech is the external expression of the user’s inner thoughts, revealing the user’s true feelings and internal needs. Capturing the user's intention correctly can not only increase the success rate of marketing, increase corporate revenue and brand awareness, but also will not affect the user experience.
[0006] 目前, 大多企业都是聘用客服人员与用户进行沟通联系, 并依赖客服人员的经 验和知识判断用户的真实意图、 促成交易的。 但是, 不同客服人员之间经验、 知识存在差距, 加上人的主观因素影响, 容易出现对用户的真实意图误判的情 况, 导致意图识别的准确性低下。 [0006] At present, most enterprises employ customer service personnel to communicate with users, and rely on the experience and knowledge of customer service personnel to determine the user's true intentions and facilitate transactions. However, there are gaps in experience and knowledge between different customer service personnel, coupled with the influence of human subjective factors, it is prone to misjudgment of the user's true intention, resulting in a low accuracy of intention recognition.
[0007] [0007]
[0008] 发明内容 SUMMARY OF THE INVENTION
[0009] 本申请实施例提供一种基于卷积神经网络的意图识别方法、 装置、 计算机设备 及存储介质, 以解决意图识别准确性低下的问题。 [0009] Embodiments of the present application provide an intent recognition method, device, computer device, and storage medium based on a convolutional neural network to solve the problem of low accuracy of intent recognition.
[0010] 一种基于卷积神经网络的意图识别方法, 包括: [0010] An intention recognition method based on a convolutional neural network includes:
[0011] 获取待识别意图的目标文本; [0011] Obtaining the target text of the intention to be identified;
[0012] 对所述目标文本进行向量化处理, 得到目标向量; [0012] Vectorizing the target text to obtain a target vector;
[0013] 将所述目标向量作为输入投入至预先训练好的卷积神经网络, 得到所述卷积神 经网络输出的目标结果向量, 所述目标结果向量中的各个元素分别为各个预设 用户意图对应的第一概率值, 第一概率值表征了所述目标文本属于对应的预设 用户意图的概率; [0013] The target vector is used as an input to a pre-trained convolutional neural network to obtain the convolutional god A target result vector output via the network, each element in the target result vector is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention ;
[0014] 将第一概率值最高的预设用户意图确定为所述目标文本对应的目标用户意图。 [0014] The preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.
[0015] 一种基于卷积神经网络的意图识别装置, 包括: [0015] An intention recognition device based on a convolutional neural network includes:
[0016] 目标文本获取模块, 用于获取待识别意图的目标文本; [0016] a target text acquisition module, configured to acquire target text of an intention to be identified;
[0017] 文本向量化模块, 用于对所述目标文本进行向量化处理, 得到目标向量; [0017] a text vectorization module, configured to vectorize the target text to obtain a target vector;
[0018] 网络识别模块, 用于将所述目标向量作为输入投入至预先训练好的卷积神经网 络, 得到所述卷积神经网络输出的目标结果向量, 所述目标结果向量中的各个 元素分别为各个预设用户意图对应的第一概率值, 第一概率值表征了所述目标 文本属于对应的预设用户意图的概率; [0018] The network recognition module is used to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is Is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;
[0019] 意图确定模块, 用于将第一概率值最高的预设用户意图确定为所述目标文本对 应的目标用户意图。 [0019] an intention determining module, configured to determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.
[0020] 一种计算机设备, 包括存储器、 处理器以及存储在所述存储器中并可在所述处 理器上运行的计算机可读指令, 所述处理器执行所述计算机可读指令时实现上 述基于卷积神经网络的意图识别方法的步骤。 [0020] A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implements the computer-readable instructions to achieve the above Steps of intent recognition method for convolutional neural networks.
[0021] 一个或多个存储有计算机可读指令的可读存储介质, 所述计算机可读存储介质 存储有计算机可读指令, 使得所述一个或多个处理器执行上述基于卷积神经网 络的意图识别方法的步骤。 [0021] One or more readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, so that the one or more processors execute the convolutional neural network-based Intent recognition method steps.
[0022] 本申请的一个或多个实施例的细节在下面的附图和描述中提出, 本申请的其他 特征和优点将从说明书、 附图以及权利要求变得明显。 [0022] The details of one or more embodiments of the present application are set forth in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings, and claims.
[0023] [0023]
[0024] 附图说明 BRIEF DESCRIPTION OF THE DRAWINGS
[0025] 为了更清楚地说明本申请实施例的技术方案, 下面将对本申请实施例的描述中 所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本申 请的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的前 提下, 还可以根据这些附图获得其他的附图。 [0025] In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only for the application In some embodiments, for those of ordinary skill in the art, without paying creative labor, other drawings may be obtained based on these drawings.
[0026] 图 1是本申请一实施例中基于卷积神经网络的意图识别方法的一应用环境示意 图; [0026] FIG. 1 is a schematic diagram of an application environment of an intent recognition method based on a convolutional neural network in an embodiment of the present application Figure;
[0027] 图 2是本申请一实施例中基于卷积神经网络的意图识别方法的一流程图; [0027] FIG. 2 is a flowchart of an intent recognition method based on a convolutional neural network in an embodiment of the present application;
[0028] 图 3是本申请一实施例中基于卷积神经网络的意图识别方法步骤 102在一个应用 场景下的流程示意图; [0028] FIG. 3 is a schematic flowchart of an intent recognition method step 102 based on a convolutional neural network in an application scenario in an embodiment of the present application;
[0029] 图 4是本申请一实施例中基于卷积神经网络的意图识别方法在一个应用场景下 对目标文本预处理的流程示意图; [0029] FIG. 4 is a schematic diagram of a process of preprocessing target text in an application scenario based on a convolutional neural network-based intent recognition method in an embodiment of the present application;
[0030] 图 5是本申请一实施例中基于卷积神经网络的意图识别方法在一个应用场景下 预先训练卷积神经网络的流程示意图; [0030] FIG. 5 is a schematic flowchart of pre-training a convolutional neural network in an application scenario based on an intent recognition method of a convolutional neural network in an embodiment of the present application;
[0031] 图 6是本申请一实施例中基于卷积神经网络的意图识别装置的结构示意图; [0031] FIG. 6 is a schematic structural diagram of an intention recognition device based on a convolutional neural network in an embodiment of the present application;
[0032] 图 7是本申请一实施例中基于卷积神经网络的意图识别装置中预先训练卷积神 经网络的部分模块的结构示意图; [0032] FIG. 7 is a schematic structural view of some modules of a pre-trained convolutional neural network in an intention recognition device based on a convolutional neural network in an embodiment of the present application;
[0033] 图 8是本申请一实施例中网络识别模块的结构示意图; [0033] FIG. 8 is a schematic structural diagram of a network identification module in an embodiment of the present application;
[0034] 图 9是本申请一实施例中计算机设备的一示意图。 [0034] FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
[0035] [0035]
[0036] 具体实施方式 DETAILED DESCRIPTION
[0037] 下面将结合本申请实施例中的附图, 对本申请实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本申请一部分实施例, 而不是全部的实 施例。 基于本申请中的实施例, 本领域普通技术人员在没有作出创造性劳动前 提下所获得的所有其他实施例, 都属于本申请保护的范围。 [0037] The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. . Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without making creative work fall within the protection scope of this application.
[0038] 本申请提供的基于卷积神经网络的意图识别方法, 可应用在如图 1的应用环境 中, 其中, 客户端通过网络与服务器进行通信。 其中, 该客户端可以但不限于 各种个人计算机、 笔记本电脑、 智能手机、 平板电脑和便携式可穿戴设备。 月艮 务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。 [0038] The intent recognition method based on the convolutional neural network provided in the present application can be applied in an application environment as shown in FIG. 1, wherein the client communicates with the server through the network. Among them, the client may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented with an independent server or a server cluster composed of multiple servers.
[0039] 在一实施例中, 如图 2所示, 提供一种基于卷积神经网络的意图识别方法, 以 该方法应用在图 1中的服务器为例进行说明, 包括如下步骤: [0039] In an embodiment, as shown in FIG. 2, a method for intent recognition based on a convolutional neural network is provided. The method is applied to the server in FIG. 1 as an example for illustration, including the following steps:
[0040] 101 获取待识别意图的目标文本; [0040] 101 Obtain the target text of the intention to be identified;
[0041] 本实施例中, 服务器可以根据实际使用的需要或者应用场景的需要获取待识别 意图的目标文本。 例如, 服务器可以与客户端通信连接, 该客户端提供给某场 所内的用户咨询问题, 用户通过客户端的麦克风输入语音问题, 客户端将该语 音问题上传给服务器, 服务器将该语音问题音转字后得到文本, 该文本即为待 识别意图的目标文本。 或者, 服务器也可以执行对大批量的话术文本识别用户 意图的任务, 某数据库预先收集大量的话术文本, 然后通过网络将多个话术文 本传输给服务器, 服务器需要对这些话术文本分别进行意图识别, 从而这些话 术文本分别为各个目标文本。 可以理解的是, 服务器还可以通过多种方式获取 到这些待识别意图的目标文本, 对此不再过多赘述, 可认为, 只要是需要服务 器识别意图的文本均可作为目标文本。 [0041] In this embodiment, the server may obtain the target text of the intention to be recognized according to actual use needs or application scene needs. For example, the server can communicate with the client, and the client provides The user inquires about the problem, the user inputs the voice question through the microphone of the client, and the client uploads the voice question to the server. The server obtains the text after transliterating the voice question, and the text is the target text of the intention to be recognized. Alternatively, the server can also perform the task of recognizing the user's intentions for a large number of utterance texts. A database collects a large amount of utterance texts in advance, and then transmits multiple utterance texts to the server through the network. The server needs to make intents for these utterance texts Recognize, so that these utterance texts are each target texts. It can be understood that the server can also obtain the target text of the intent to be identified in various ways, which will not be described in detail here. It can be considered that any text that requires the server to identify the intent can be used as the target text.
[0042] 需要说明的是, 本实施例所说的文本一般是指话术文本, 即由人所说的话通过 音转字得到的文本内容。 [0042] It should be noted that the text referred to in this embodiment generally refers to utterance text, that is, text content obtained by transliteration of words spoken by a person.
[0043] 102、 对所述目标文本进行向量化处理, 得到目标向量; [0043] 102. Vectorize the target text to obtain a target vector;
[0044] 在获取到目标文本之后, 为了便于后续卷积神经网络的识别和学习, 服务器需 要对该目标文本进行向量化处理, 即将文本转化为向量的方式表示, 从而得到 目标向量。 具体地, 服务器可以将目标文本以数据矩阵的形式记载, 在数据矩 阵中, 目标文本中的每个字词映射为该数据矩阵中的一个行向量。 [0044] After the target text is acquired, in order to facilitate subsequent recognition and learning of the convolutional neural network, the server needs to vectorize the target text, that is, express the text into a vector to represent the target vector. Specifically, the server may record the target text in the form of a data matrix. In the data matrix, each word in the target text is mapped to a row vector in the data matrix.
[0045] 为便于理解的, 在一个具体应用场景下, 如图 3所示, 进一步地, 所述步骤 102 具体可以包括: [0045] For ease of understanding, in a specific application scenario, as shown in FIG. 3, further, the step 102 may specifically include:
[0046] 201、 采用预设的字典将所述目标文本中各个字词转化为各个一维行向量, 所 述字典记录了字词与各个一维行向量之间的对应关系; [0046] 201. A preset dictionary is used to convert each word in the target text into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;
[0047] 202、 按照所述目标文本中各个字词的次序将所述各个一维行向量组成一个二 维向量作为目标向量。 [0047] 202. Combining the one-dimensional line vectors into a two-dimensional vector as the target vector according to the order of the words in the target text.
[0048] 对于上述步骤 201, 服务器预先设有字典, 该字典记录了各个字词与各个一维 行向量之前的一一对应关系。 例如, 可以设置“我”与“1号行向量”对应, “和”与“ 2号行向量”对应, “你”与“3号行向量”对应 ......, 通过尽可能穷尽所有字词来完 善该字典, 从而当需要转化该目标文本时, 服务器可以采用预设的字典将所述 目标文本中各个字词转化为各个一维行向量。 举例说明, 假设该目标文本为“我 和你去吃饭”, 经查询字典得知: “我”与“1号行向量”对应、 “和”与“2号行向量” 对应、 “你”与“3号行向量”对应、 “去”与“4号行向量”对应、 “吃饭”与“5号行向量” 对应, 从而分别得到 1-5号行向量。 其中, 上述的 1-5号行向量是指代编号为 1、 2 、 3、 4、 5的行向量, 具体每个行向量应当为包含多个元素的一维矩阵, 比如 [7, 51,423, 50,0]为一个一维行向量, 在设置字典时, 可以将这个一维行向量定义为 k 号行向量, k大于等于 1。 [0048] For the above step 201, the server pre-sets a dictionary, which records the one-to-one correspondence between each word and each one-dimensional row vector. For example, you can set "I" to correspond to "No. 1 line vector", "He" to correspond to "No. 2 line vector", "You" to correspond to "No. 3 line vector"..., by exhausting as much as possible All words are used to complete the dictionary, so that when the target text needs to be converted, the server can use a preset dictionary to convert each word in the target text into each one-dimensional line vector. For example, suppose that the target text is "I and you go to dinner". After querying the dictionary, it is learned that: "I" corresponds to "line number 1", "He" corresponds to "line number 2", "you" and "Line 3 Vector" corresponds, "Go" corresponds to "Line 4 Vector", "Dine" and "Line 5 Vector" Correspondingly, to obtain row vectors 1-5 respectively. Among them, the above-mentioned row vectors 1-5 refer to the row vectors numbered 1, 2, 3, 4, and 5. Specifically, each row vector should be a one-dimensional matrix containing multiple elements, such as [7, 51,423, 50,0] is a one-dimensional row vector. When setting the dictionary, this one-dimensional row vector can be defined as the k-th row vector, where k is greater than or equal to 1.
[0049] 优选地, 该字典的构建可以采用自动设置的方式完成, 服务器可以一边使用该 字典一边设置该字典, 具体可以是: 当需要将文本转化为一维行向量时, 服务 器可以逐个获取该文本中的字词, 并查询该字典中是否记录有该字词与某个一 维行向量的对应关系; 若有, 则服务器获取与该字词对应的一维行向量; 若没 有, 则将该字词新增至该字典中, 并分配一个未分配的一维行向量与该字词对 应, 然后服务器获取与该字词对应的一维行向量; 服务器在获取到该文本中所 有字词对应的一维行向量后, 即可执行下述步骤 202进行二维向量的构建, 同时 , 也将该文本中之前未分配在字典的字词新增至了字典中, 实现了对字典的完 善。 [0049] Preferably, the construction of the dictionary may be completed in an automatic setting manner, and the server may set the dictionary while using the dictionary, which may specifically be: when text needs to be converted into a one-dimensional line vector, the server may obtain the dictionary one by one The word in the text, and query whether the correspondence between the word and a one-dimensional line vector is recorded in the dictionary; if there is, the server obtains the one-dimensional line vector corresponding to the word; if not, it will The word is added to the dictionary, and an unassigned one-dimensional line vector is assigned to correspond to the word, and then the server obtains the one-dimensional line vector corresponding to the word; the server obtains all the words in the text After the corresponding one-dimensional line vector, you can perform the following step 202 to construct a two-dimensional vector. At the same time, the words in the text that were not previously assigned to the dictionary are also added to the dictionary to complete the dictionary. .
[0050] 需要说明的是, 在设置字典时, 未分配的一维行向量可以由工作人员手动设定 , 也可以从第三方平台上获取到现有的词向量, 比如可以载入新浪、 知乎等网 站提供的词向量作为本实施例设置字典所需的一维行向量。 [0050] It should be noted that when setting the dictionary, the unallocated one-dimensional line vector can be manually set by the staff, or the existing word vector can be obtained from a third-party platform, for example, it can be loaded into Sina, Zhi The word vector provided by the website is used as the one-dimensional line vector required for setting the dictionary in this embodiment.
[0051] 对于上述步骤 202, 将“ 1, 2, 3, 4, 5”这五个一维行向量依次组成一个二维 矩阵, 即二维向量, 即可得到所述目标向量。 其中, X1-X5分别代表上述的 1-5 号行向量。 [0051] For the above step 202, the five one-dimensional row vectors “1, 2, 3, 4, 5” are sequentially formed into a two-dimensional matrix, that is, a two-dimensional vector, to obtain the target vector. Among them, X1-X5 respectively represent the above-mentioned row vectors 1-5.
[0052] 考虑到用户的多样性, 步骤 102中的目标文本很可能在格式上不符合要求或者 存在较多干扰信息, 因此, 本实施例在将其转化为目标向量之前还可以对其进 行预处理, 使得该目标文本在格式和内容上更加便于向量转化, 以及后续卷积 神经网络的识别和分析。 如图 4所示, 进一步地, 在步骤 102之前, 本方法还包 括: [0052] Considering the diversity of users, the target text in step 102 may not meet the requirements in format or there is more interference information. Therefore, in this embodiment, it may be pre-processed before converting it into a target vector. Processing makes the target text more convenient for vector conversion in format and content, and subsequent recognition and analysis of convolutional neural networks. As shown in FIG. 4, further, before step 102, the method further includes:
[0053] 301、 删除所述目标文本中的指定文本, 所述指定文本至少包括停用词或标点 符号; [0053] 301. Delete specified text in the target text, where the specified text includes at least stop words or punctuation marks;
[0054] 302、 对删除指定文本后的所述目标文本进行分词处理, 得到所述目标文本中 的各个字词。 [0055] 对于上述步骤 301, 这里所说的停用词可以是指使用频率特别高的单汉字, 比 如“的”、 “了”等无实际语言意义的汉字, 另外, 指定文本还可以包括标点符号, 比如逗号、 句号等, 这些标点符号也没有实际语言意义。 执行步骤 301时, 服务 器可以将目标文本中的指定文本删除, 举例说明, 假设该指定文本包括停用词 和标点符号, 该目标文本中包括文本“我今天来上班了。 ”, 服务器可以先将其中 的“了”等无实际意义的停用词删除, 并将“。 ”等标点符号删除, 从而得到删除后 的文本“我今天来上班”。 [0054] 302. Perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text. [0055] For the above step 301, the stop words mentioned here may refer to single Chinese characters with a particularly high frequency of use, such as Chinese characters such as "的", "了", etc., which have no actual language meaning. In addition, the designated text may also include punctuation Symbols, such as commas, periods, etc., these punctuation marks also have no actual language meaning. When performing step 301, the server can delete the specified text in the target text. For example, suppose that the specified text includes stop words and punctuation marks, and the target text includes the text "I am coming to work today." Among them, stop words such as "了" which have no practical meaning are deleted, and punctuation marks such as "." are deleted, so that the deleted text "I come to work today" is obtained.
[0056] 对于上述步骤 302, 在删除指定文本后, 服务器还可以对该目标文本进行分词 处理, 承接上述文本“我今天来上班”, 服务器可以通过第三方分词工具将该文本 进行语句分割, 转化为“我 今天来上班”四个字词。 [0056] For the above step 302, after deleting the specified text, the server can also perform word segmentation processing on the target text, to undertake the above text "I come to work today", the server can use a third-party word segmentation tool to segment the text and convert Four words for "I come to work today".
[0057] 103、 将所述目标向量作为输入投入至预先训练好的卷积神经网络, 得到所述 卷积神经网络输出的目标结果向量, 所述目标结果向量中的各个元素分别为各 个预设用户意图对应的第一概率值, 第一概率值表征了所述目标文本属于对应 的预设用户意图的概率; [0057] 103. Put the target vector as an input into a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, each element in the target result vector is a preset A first probability value corresponding to the user's intention, the first probability value represents the probability that the target text belongs to the corresponding preset user's intention;
[0058] 在得到所述目标文本对应的目标向量之后, 服务器可以将所述目标向量作为输 入投入至预先训练好的卷积神经网络, 得到所述卷积神经网络输出的目标结果 向量, 所述目标结果向量中的各个元素分别为各个预设用户意图对应的第一概 率值, 其中, 第一概率值表征了所述目标文本属于对应的预设用户意图的概率 。 可以理解的是, 目标结果向量中一般包含了多个元素, 每个元素均为一个第 一概率值, 这些第一概率值与多个预设用户意图一一对应, 并表征了目标文本 属于对应的预设用户意图的概率。 可知, 若某个预设用户意图对应的第一概率 值越大, 则说明该目标文本属于该预设用户意图的概率越高。 [0058] After obtaining the target vector corresponding to the target text, the server may use the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, the Each element in the target result vector is a first probability value corresponding to each preset user intention, where the first probability value represents the probability that the target text belongs to the corresponding preset user intention. It can be understood that the target result vector generally contains multiple elements, each of which is a first probability value, and these first probability values correspond one-to-one with multiple preset user intentions, and indicate that the target text belongs to the corresponding The probability of preset user intent. It can be seen that, if the first probability value corresponding to a preset user intention is larger, it means that the probability that the target text belongs to the preset user intention is higher.
[0059] 为便于理解, 下面将对卷积神经网络的训练过程进行详细描述。 如图 5所示, 进一步地, 所述卷积神经网络通过以下步骤预先训练好: [0059] For ease of understanding, the training process of the convolutional neural network will be described in detail below. As shown in FIG. 5, further, the convolutional neural network is pre-trained through the following steps:
[0060] 401、 分别收集属于各个预设用户意图的话术文本; [0060] 401. Collect utterance texts belonging to the intention of each preset user, respectively;
[0061] 402、 对收集到的话术文本分别进行向量化处理, 得到各个话术文本对应的样 本向量; [0061] 402. Perform vectorization processing on the collected utterance texts respectively to obtain a sample vector corresponding to each utterance text;
[0062] 403、 针对每个预设用户意图, 将所述预设用户意图对应的样本向量的标记值 记为 1, 其它样本向量的标记值记为 0, 所述其它样本向量是指除所述预设用户 意图对应的样本向量之外的样本向量; [0062] 403. For each preset user intention, mark values of the sample vectors corresponding to the preset user intention It is recorded as 1, and the tag value of other sample vectors is recorded as 0, and the other sample vectors refer to sample vectors other than the sample vector corresponding to the preset user intention;
[0063] 404、 针对每个预设用户意图, 将所有样本向量作为输入投入至卷积神经网络 进行训练, 得到样本结果向量, 所述样本结果向量由各个元素组成, 所述各个 元素分别表征了所述样本向量对应的话术文本分别属于各个预设用户意图的概 率; [0063] 404. For each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector. The sample result vector is composed of various elements, and each of the elements is characterized The probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;
[0064] 405、 针对每个预设用户意图, 以输出的各个样本结果向量作为调整目标, 调 整所述卷积神经网络的参数, 以最小化得到的所述各个样本结果向量与各个样 本向量对应的标记值之间的误差; [0064] 405. For each preset user intention, the output of each sample result vector is used as an adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the corresponding sample result vectors obtained to correspond to each sample vector. Error between the marked values of
[0065] 406、 若所述各个样本结果向量与各个样本向量对应的标记值之间的误差满足 预设的训练终止条件, 则确定所述卷积神经网络已训练好。 [0065] 406. If the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset training termination condition, it is determined that the convolutional neural network has been trained well.
[0066] 对于上述步骤 401, 本实施例中, 针对实际应用场景, 工作人员可以预先在服 务器上设置好需要训练的各个预设用户意图, 例如可以包括“同意聆听”、 “拒绝 购买”、 “愿意等待”等意图, 针对这些预设用户意图, 工作人员还需要在具体应 用场景下收集各自对应的话术文本, 比如用户实际咨询的问题转化得来的话术 文本。 在收集话术文本时, 服务器可以通过专业知识库、 网络数据库等渠道收 集属于各个预设用户意图的话术文本。 需要说明的是, 每个预设用户意图对应 的话术文本应当达到一定的数量级, 各个预设用户意图之间话术文本的数量可 以有一定差距, 但不应相差过远, 避免影响对卷积神经网络的训练效果。 例如 , 可以收集到的话术文本为: “同意聆听”对应的话术文本的数量为 100万条, “拒 绝购买”对应的话术文本的数量为 20万条, “愿意等待”对应的话术文本的数量为 3 0万条。 [0066] For the above step 401, in this embodiment, for the actual application scenario, the staff may set in advance various preset user intentions that need to be trained on the server, for example, may include “agree to listen”, “refuse to buy”, “ Intentions such as “willing to wait”, for these preset user intentions, the staff also needs to collect the corresponding utterance texts in specific application scenarios, such as the utterance texts converted from the questions actually consulted by the user. When collecting spoken texts, the server can collect spoken texts belonging to the intention of each preset user through professional knowledge bases, network databases and other channels. It should be noted that the utterance text corresponding to each preset user's intention should reach a certain order of magnitude, and the number of utterance texts between each preset user's intention may have a certain gap, but it should not be too far apart to avoid affecting the convolution The training effect of neural network. For example, the utterance texts that can be collected are: the number of utterance texts corresponding to "agree to listen" is 1 million, the number of utterance texts corresponding to "refuse to buy" is 200,000, and the number of utterance texts corresponding to "willing to wait" It is 300,000.
[0067] 对于上述步骤 402, 可以理解的是, 在将话术文本投入至卷积神经网络进行训 练之前, 需要将收集到的这些话术文本分别进行向量化处理, 得到各个话术文 本对应的样本向量, 将文本转化为向量更便于卷积神经网络的理解和训练。 需 要说明的是, 考虑到收集到的话术文本来源众多, 话术文本的格式往往并不统 一, 这容易对后续训练造成干扰。 因此, 服务器在将这些话术文本进行向量化 处理之前可以对其进行预处理, 包括停用词、 标点符号的删除以及字词的切割 。 例如, 假设某个话术文本为“我今天来上班了。 ”, 服务器可以先将其中的“了” 等无实际意义的停用词删除, 并将“。 ”等标点符号删除, 然后使用第三方分词工 具将该文本进行语句分割, 转化为“我 今天来上班”四个字词。 在预处理后, 服 务器再对该话术文本中每个字词进行向量化映射, 即可得到该话术文本中各个 字词对应的行向量, 通过将该话术文本中每个字词进行向量化得到多个行向量 , 这些行向量组成该话术文本对应的样本向量 (二维向量) 。 具体地, 样本向 量可以以数据矩阵的形式记载。 [0067] For the above step 402, it can be understood that, before the utterance text is input to the convolutional neural network for training, the collected utterance text needs to be vectorized separately to obtain the corresponding Sample vectors, which convert text to vectors, make it easier to understand and train convolutional neural networks. It should be noted that, considering that there are many sources of collected utterance text, the format of utterance text is often not uniform, which is easy to cause interference to subsequent training. Therefore, the server can preprocess these utterance texts before vectorizing them, including stop words, punctuation, and word cutting . For example, assuming that a certain text is "I'm coming to work today.", the server can first delete the stop words such as "了" which have no practical meaning, and delete the punctuation marks such as ".", and then use the first The tripartite word segmentation tool divides the text into sentences and converts them into four words "I come to work today". After preprocessing, the server then vectorizes each word in the spoken text to obtain a line vector corresponding to each word in the spoken text. By performing each word in the spoken text Vectorization results in multiple line vectors, which constitute the sample vector (two-dimensional vector) corresponding to the text. Specifically, the sample vector may be recorded in the form of a data matrix.
[0068] 对于上述步骤 403 , 可以理解的是, 在训练之前, 需要对样本向量进行标记, 本实施例中由于需要针对多个预设用户意图进行训练, 因此应当针对不同的预 设用户意图分别进行标注。 举例说明, 假设共 3个预设用户意图, 分别为“同意聆 听”、 “拒绝购买”和“愿意等待”, 贝 1】, 针对“同意聆听”时, 将该“同意聆听”下的 各个样本向量的标记值记为 1, “拒绝购买”和“愿意等待”下的各个样本向量的标 记值记为 0, 并用于后续针对该“同意聆听”时的卷积神经网络的训练; 同理, 针 对“拒绝购买”时, 将该“拒绝购买”下的各个样本向量的标记值记为 1, “同意聆听 ”和“愿意等待”下的各个样本向量的标记值记为 0, 并用于后续针对该“拒绝购买” 时的卷积神经网络的训练; 针对“愿意等待”以及其它预设用户意图同理处理, 此 处不再赘述。 [0068] For the above step 403, it can be understood that before training, the sample vector needs to be marked. In this embodiment, since training needs to be performed for multiple preset user intentions, different preset user intentions should be separately targeted. Make annotations. For example, assuming a total of three preset user intentions, namely "agree to listen", "refuse to buy" and "willing to wait", shell 1], for "agree to listen", each sample under "agree to listen" The label value of the vector is recorded as 1, the label value of each sample vector under "refuse to buy" and "willing to wait" is recorded as 0, and is used for subsequent training of the convolutional neural network for the "agree to listen"; For "Refused to Purchase", the label value of each sample vector under "Refused to Purchase" is recorded as 1, and the labeled value of each sample vector under "Agree to Listen" and "Waiting to Wait" is recorded as 0, and used for subsequent targeting The training of the convolutional neural network at the time of "refusal to purchase"; for the "willing to wait" and other preset user intentions, it will be treated in the same way, and will not be repeated here.
[0069] 对于上述步骤 404, 在训练时, 针对每个预设用户意图, 将所有样本向量作为 输入投入至卷积神经网络进行训练, 得到样本结果向量。 可以理解的是, 由于 针对该预设用户意图, 除该预设用户意图下各个样本向量的标记值为 1, 其它标 记值均为 0, 将一个样本向量输入卷积神经网络后, 该卷积神经网络输出由 N个 元素组成的样本结果向量, 这 N个元素分别表征了该样本向量对应的话术文本分 别属于 N个预设用户意图的概率。 [0069] For the above step 404, during training, for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector. It can be understood that, for the preset user intent, the label value of each sample vector except the preset user intent is 1, and the other label values are all 0. After inputting a sample vector to the convolutional neural network, the convolution The neural network outputs a sample result vector composed of N elements, and these N elements respectively represent the probability that the utterance text corresponding to the sample vector belongs to N preset user intentions.
[0070] 进一步地, 所述步骤 404具体可以包括: 每个样本向量输入到所述卷积神经网 络训练时, 按照预设第一丢弃概率随机丢弃卷积层输出的特征图, 并分别使用 各个预设尺寸的卷积窗口提取特征图上的特征和执行池化操作, 再按照预设第 二丢弃概率随机丢弃池化操作得到的输出向量, 将丢弃剩下的输出向量输入全 连接层得到样本结果向量。 其中, 该第一丢弃概率和第二丢弃概率可以根据实 际使用情况设定, 比如可以设定为 0.6。 可知, 这里加入了特征图丢弃操作和池 化后输出向量的丢弃操作, 虽然在一定程度上削弱了每次训练对卷积神经网络 的训练效果, 但却大大加快了每次的训练速度, 可以使得卷积神经网络在短时 间内快速完成大量样本的训练, 在训练大量样本后, 从整体上更有利于提升卷 积神经网络的训练效果, 识别准确性更高。 [0070] Further, the step 404 may specifically include: when each sample vector is input to the convolutional neural network training, randomly discard the feature map output by the convolution layer according to a preset first discard probability, and use each The convolution window of the preset size extracts the features on the feature map and performs the pooling operation, and then randomly discards the output vector obtained by the pooling operation according to the preset second discard probability, and discards the remaining output vector into the fully connected layer to obtain the sample Result vector. Among them, the first drop probability and the second drop probability may be based on actual The setting of the international usage can be set to 0.6, for example. It can be seen that the feature map discarding operation and the discarding operation of the output vector after pooling are added here. Although the training effect of each training on the convolutional neural network is weakened to a certain extent, it greatly speeds up the training speed of each time. This allows the convolutional neural network to quickly complete the training of a large number of samples in a short time. After training a large number of samples, it is more conducive to improving the training effect of the convolutional neural network as a whole and the recognition accuracy is higher.
[0071] 对于上述步骤 405 , 可以理解的是, 在训练卷积神经网络的过程中, 需要调整 该卷积神经网络的参数。 比如, 卷积神经网络的网络结构主要包括卷积层、 池 化层、 随机失活层、 正则化层和 softmax层, 每层中均设有若干个参数, 在一个 样本训练过程中, 通过调整这些参数可以影响卷积神经网络的输出结果。 举例 说明, 假设针对“同意聆听”这一预设用户意图, 将“同意聆听”下的某个样本向量 投入该卷积神经网络, 其输出的结果为: [0.3, 0.2, 0.5] , 该结果中 3个元素的值 代表了该样本向量对应的话术文本分别属于“同意聆听”、 “拒绝购买”、 “愿意等 待”3个预设用户意图的概率, 即该话术文本属于“同意聆听”的概率为 0.3 ; 该话 术文本属于“拒绝购买”的概率为 0.2; 该话术文本属于“愿意等待”的概率为 0.5。 通过该样本向量的标记值为 1可知, 该话术文本属于“同意聆听”, 因此可以通过 调整该卷积神经网络的参数, 尽量使得卷积神经网络输出的结果为“1, 0, 0”, 其中最主要的是尽量使得输出的结果中对应“同意聆听”的元素的值尽可能接近 1 [0071] For the above step 405, it can be understood that, in the process of training the convolutional neural network, the parameters of the convolutional neural network need to be adjusted. For example, the network structure of a convolutional neural network mainly includes a convolutional layer, a pooling layer, a random deactivation layer, a regularization layer, and a softmax layer. Each layer is provided with several parameters. In a sample training process, through adjustment These parameters can affect the output of the convolutional neural network. For example, suppose that for a preset user intention of "agree to listen", a sample vector under "agree to listen" is put into the convolutional neural network, and the output result is: [0.3, 0.2, 0.5], the result The values of the three elements in the sample represent the probability that the utterance text corresponding to the sample vector belongs to the three preset user intentions of "agree to listen", "refuse to buy", and "willing to wait", that is, the utterance text belongs to "agree to listen" The probability is 0.3; the probability that the utterance text belongs to "refuse to buy" is 0.2; the probability that the utterance text belongs to "willing to wait" is 0.5. It can be known from the label value of the sample vector that the utterance text belongs to "agree to listen", so the parameters of the convolutional neural network can be adjusted to make the output of the convolutional neural network as "1, 0, 0". , The most important of which is to make the value of the element corresponding to "agree to listen" in the output result as close to 1 as possible
[0072] 对于上述步骤 406, 在针对各个预设用户意图均执行完成上述步骤 403-405之后 , 可以判断所述各个样本结果向量与各个样本向量对应的标记值之间的误差是 否满足预设的训练终止条件, 若满足, 则说明该卷积神经网络中的各个参数已 经调整到位, 可以确定该卷积神经网络已训练完成; 反之, 若不满足, 则说明 该卷积神经网络还需要继续训练。 其中, 该训练终止条件可以根据实际使用情 况预先设定, 具体地, 可以将该训练终止条件设定为: 若所述各个样本结果向 量与各个样本向量对应的标记值之间的误差均小于指定误差值, 则认为其满足 该预设的训练终止条件。 或者, 也可以将其设为: 使用验证集中的话术文本执 行上述步骤 402-404, 若卷积神经网络输出的样本结果向量与标记值之间的误差 在一定范围内, 则认为其满足该预设的训练终止条件。 其中, 该验证集中的话 术文本的收集与上述步骤 401类似, 具体地, 可以执行上述步骤 401收集得到各 个预设用户意图的话术文本后, 将收集得到的话术文本中的一定比例划分为训 练集, 剩余的话术文本划分为验证集。 比如, 可以将收集得到的话术文本中随 机划分 80%作为后续训练卷积神经网络的训练集的样本, 将其它的 20%划分为后 续验证卷积神经网络是否训练完成, 也即是否满足预设训练终止条件的验证集 的样本。 [0072] For the above step 406, after the completion of the above steps 403-405 for each preset user intention, it can be determined whether the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset The training termination condition, if satisfied, means that the various parameters in the convolutional neural network have been adjusted in place, and it can be determined that the convolutional neural network has been trained; otherwise, if it does not meet, it means that the convolutional neural network needs to continue training . Wherein, the training termination condition may be preset according to actual usage, specifically, the training termination condition may be set as follows: if the error between each sample result vector and the label value corresponding to each sample vector is less than the specified The error value is considered to satisfy the preset training termination condition. Alternatively, it can be set as follows: using the utterance text in the verification set to perform the above steps 402-404, if the error between the sample result vector output by the convolutional neural network and the label value is within a certain range, it is considered that it meets the pre- Set the training termination conditions. Among them, the words in the verification set The collection of operative texts is similar to the above step 401. Specifically, after performing the foregoing step 401 to collect utterance texts of each preset user intention, a certain percentage of the collected utterance texts is divided into a training set, and the remaining utterance texts are divided Is the validation set. For example, you can randomly divide 80% of the collected utterance text as a sample of the training set of the subsequent training convolutional neural network, and divide the other 20% into the subsequent verification whether the training of the convolutional neural network is completed, that is, whether it meets the preset A sample of the validation set of training termination conditions.
[0073] 本实施例中, 见上述对步骤 404的描述可知, 训练时可加入随机丢弃的机制来 提高卷积神经网络的训练效率。 与之不同的是, 在使用卷积神经网络时, 为了 保证卷积神经网络的识别准确率, 不使用随机丢弃的机制。 为便于理解, 进一 步地, 所述步骤 103具体可以包括: 在将所述目标向量输入所述卷积神经网络之 后, 针对所述卷积神经网络中卷积层输出的特征图, 分别使用所述卷积神经网 络中的卷积窗口提取所述特征图上的特征和执行池化操作, 再将池化操作得到 的输出向量输入所述卷积神经网络中的全连接层, 得到所述目标结果向量。 需 要说明的是, 这里所使用的卷积窗口为训练好的卷积神经网络中的卷积窗口, 可知, 卷积窗口的尺寸和个数均在训练时已经调整完成, 此处无需在意其尺寸 和个数。 [0073] In this embodiment, referring to the above description of step 404, it can be known that a random discard mechanism can be added during training to improve the training efficiency of the convolutional neural network. The difference is that when using the convolutional neural network, in order to ensure the recognition accuracy of the convolutional neural network, the random discard mechanism is not used. For ease of understanding, further, the step 103 may specifically include: after inputting the target vector into the convolutional neural network, for the feature map output by the convolutional layer in the convolutional neural network, use the The convolution window in the convolutional neural network extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector. It should be noted that the convolution window used here is the convolution window in the trained convolutional neural network. It can be seen that the size and number of the convolution window have been adjusted during training, and there is no need to care about the size here And the number.
[0074] 104、 将第一概率值最高的预设用户意图确定为所述目标文本对应的目标用户 意图。 [0074] 104. Determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.
[0075] 可以理解的是, 服务器在得到所述卷积神经网络输出的目标结果向量后, 由于 目标结果向量中各个元素分别为各个预设用户意图对应的第一概率值, 而第一 概率值表征了所述目标文本属于对应的预设用户意图的概率, 这也就意味着, 第一概率值越高, 说明该目标文本属于该预设用户意图的概率就越高。 因此, 服务器选取其中第一概率值最高的预设用户意图确定为所述目标文本对应的目 标用户意图, 这在最大程度上把握了用户的实际情况和真实意图。 [0075] It can be understood that after the server obtains the target result vector output by the convolutional neural network, each element in the target result vector is a first probability value corresponding to each preset user intention, and the first probability value It represents the probability that the target text belongs to the corresponding preset user intention, which means that the higher the first probability value, the higher the probability that the target text belongs to the preset user intention. Therefore, the server selects the preset user intent with the highest first probability value to determine as the target user intent corresponding to the target text, which grasps the actual situation and real intention of the user to the greatest extent.
[0076] 本申请实施例中, 首先, 获取待识别意图的目标文本; 然后, 对所述目标文本 进行向量化处理, 得到目标向量; 接着, 将所述目标向量作为输入投入至预先 训练好的卷积神经网络, 得到所述卷积神经网络输出的目标结果向量, 所述目 标结果向量中的各个元素分别为各个预设用户意图对应的第一概率值, 第一概 率值表征了所述目标文本属于对应的预设用户意图的概率; 最后, 将第一概率 值最高的预设用户意图确定为所述目标文本对应的目标用户意图。 可见, 本申 请可以通过预先训练好的卷积神经网络准确地从目标文本中识别出用户的真实 意图, 不仅避免了经验、 知识存在差距导致识别偏差的情况, 而且消除了人的 主观因素影响, 提升了意图识别的准确性, 有助于企业把握用户的真实意图并 促成交易。 [0076] In the embodiment of the present application, first, the target text of the intention to be recognized is obtained; then, the target text is vectorized to obtain a target vector; then, the target vector is input as an input to the pre-trained A convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is a first probability value corresponding to each preset user intention, The rate value represents the probability that the target text belongs to the corresponding preset user intention; finally, the preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text. It can be seen that this application can accurately identify the user's true intention from the target text through the pre-trained convolutional neural network, which not only avoids the recognition bias caused by the gap in experience and knowledge, but also eliminates the influence of human subjective factors. The accuracy of intent recognition is improved, which helps companies to grasp the true intentions of users and facilitate transactions.
[0077] 应理解, 上述实施例中各步骤的序号的大小并不意味着执行顺序的先后, 各过 程的执行顺序应以其功能和内在逻辑确定, 而不应对本申请实施例的实施过程 构成任何限定。 [0077] It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution, the execution order of each process should be determined by its function and inherent logic, and should not constitute the implementation process of the embodiments of the present application Any limitation.
[0078] [0078]
[0079] 在一实施例中, 提供一种基于卷积神经网络的意图识别装置, 该基于卷积神经 网络的意图识别装置与上述实施例中基于卷积神经网络的意图识别方法一一对 应。 如图 6所示, 该基于卷积神经网络的意图识别装置包括目标文本获取模块 50 1、 文本向量化模块 502、 网络识别模块 503和意图确定模块 504。 各功能模块详 细说明如下: [0079] In an embodiment, an intent recognition device based on a convolutional neural network is provided. The intent recognition device based on a convolutional neural network corresponds to the intent recognition method based on the convolutional neural network in the above embodiment. As shown in FIG. 6, the intention recognition device based on a convolutional neural network includes a target text acquisition module 501, a text vectorization module 502, a network recognition module 503, and an intention determination module 504. The detailed description of each function module is as follows:
[0080] 目标文本获取模块 501, 用于获取待识别意图的目标文本; [0080] The target text acquisition module 501 is used to acquire the target text of the intention to be identified;
[0081] 文本向量化模块 502, 用于对所述目标文本进行向量化处理, 得到目标向量; [0081] a text vectorization module 502, configured to vectorize the target text to obtain a target vector;
[0082] 网络识别模块 503 , 用于将所述目标向量作为输入投入至预先训练好的卷积神 经网络, 得到所述卷积神经网络输出的目标结果向量, 所述目标结果向量中的 各个元素分别为各个预设用户意图对应的第一概率值, 第一概率值表征了所述 目标文本属于对应的预设用户意图的概率; [0082] The network recognition module 503 is configured to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector Are first probability values corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;
[0083] 意图确定模块 504, 用于将第一概率值最高的预设用户意图确定为所述目标文 本对应的目标用户意图。 [0083] The intention determining module 504 is configured to determine the preset user intention with the highest first probability value as the target user intention corresponding to the target text.
[0084] 如图 7所示, 进一步地, 所述卷积神经网络可以通过以下模块预先训练好: [0084] As shown in FIG. 7, further, the convolutional neural network may be pre-trained by the following modules:
[0085] 文本收集模块 505, 用于分别收集属于各个预设用户意图的话术文本; [0085] The text collection module 505 is configured to separately collect utterance text belonging to the intention of each preset user;
[0086] 样本向量化模块 506, 用于对收集到的话术文本分别进行向量化处理, 得到各 个话术文本对应的样本向量; [0086] The sample vectorization module 506 is used to vectorize the collected utterance text to obtain a sample vector corresponding to each utterance text;
[0087] 样本标记模块 507, 用于针对每个预设用户意图, 将所述预设用户意图对应的 样本向量的标记值记为 1, 其它样本向量的标记值记为 0, 所述其它样本向量是 指除所述预设用户意图对应的样本向量之外的样本向量; [0087] The sample marking module 507 is configured to, for each preset user intent, correspond to the preset user intent The label value of the sample vector is recorded as 1, and the label value of the other sample vectors is recorded as 0. The other sample vector refers to a sample vector other than the sample vector corresponding to the preset user intention;
[0088] 样本投入模块 508, 用于针对每个预设用户意图, 将所有样本向量作为输入投 入至卷积神经网络进行训练, 得到样本结果向量, 所述样本结果向量由各个元 素组成, 所述各个元素分别表征了所述样本向量对应的话术文本分别属于各个 预设用户意图的概率; [0088] The sample input module 508 is configured to input all sample vectors as input to the convolutional neural network for training for each preset user intent to obtain a sample result vector, and the sample result vector is composed of various elements. Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;
[0089] 网络参数调整模块 509, 用于针对每个预设用户意图, 以输出的各个样本结果 向量作为调整目标, 调整所述卷积神经网络的参数, 以最小化得到的所述各个 样本结果向量与各个样本向量对应的标记值之间的误差; [0089] The network parameter adjustment module 509 is configured to adjust the parameters of the convolutional neural network with each output sample result vector as the adjustment target for each preset user intent to minimize the obtained sample results The error between the vector and the label value corresponding to each sample vector;
[0090] 训练完成确定模块 510, 用于若所述各个样本结果向量与各个样本向量对应的 标记值之间的误差满足预设的训练终止条件, 则确定所述卷积神经网络已训练 好。 [0090] The training completion determination module 510 is configured to determine that the convolutional neural network has been trained if the error between each sample result vector and the label value corresponding to each sample vector satisfies a preset training termination condition.
[0091] 如图 8所示, 进一步地, 所述网络识别模块 503可以包括: [0091] As shown in FIG. 8, further, the network identification module 503 may include:
[0092] 神经网络识别单元 5031, 用于在将所述目标向量输入所述卷积神经网络之后, 针对所述卷积神经网络中卷积层输出的特征图, 分别使用所述卷积神经网络中 的卷积窗口提取所述特征图上的特征和执行池化操作, 再将池化操作得到的输 出向量输入所述卷积神经网络中的全连接层, 得到所述目标结果向量。 [0092] The neural network recognition unit 5031 is configured to use the convolutional neural network for the feature map output by the convolutional layer in the convolutional neural network after inputting the target vector to the convolutional neural network The convolution window in extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector.
[0093] 进一步地, 所述文本向量化模块可以包括: [0093] Further, the text vectorization module may include:
[0094] 一维向量转化单元, 用于采用预设的字典将所述目标文本中各个字词转化为各 个一维行向量, 所述字典记录了字词与各个一维行向量之间的对应关系; [0094] A one-dimensional vector conversion unit for converting each word in the target text into each one-dimensional line vector using a preset dictionary, the dictionary records the correspondence between the words and each one-dimensional line vector Relationship
[0095] 二维向量组成单元, 用于按照所述目标文本中各个字词的次序将所述各个一维 行向量组成一个二维向量作为目标向量。 [0095] The two-dimensional vector composition unit is configured to compose each one-dimensional line vector into a two-dimensional vector as the target vector according to the order of each word in the target text.
[0096] 进一步地, 所述基于卷积神经网络的意图识别装置还可以包括: [0096] Further, the intention recognition device based on the convolutional neural network may further include:
[0097] 指定文本删除模块, 用于删除所述目标文本中的指定文本, 所述指定文本至少 包括停用词或标点符号; [0097] a designated text deleting module, configured to delete the designated text in the target text, the designated text includes at least stop words or punctuation marks;
[0098] 分词处理模块, 用于对删除指定文本后的所述目标文本进行分词处理, 得到所 述目标文本中的各个字词。 [0098] The word segmentation processing module is configured to perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.
[0099] [0100] 关于基于卷积神经网络的意图识别装置的具体限定可以参见上文中对于基于卷 积神经网络的意图识别方法的限定, 在此不再赘述。 上述基于卷积神经网络的 意图识别装置中的各个模块可全部或部分通过软件、 硬件及其组合来实现。 上 述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中, 也可以以软 件形式存储于计算机设备中的存储器中, 以便于处理器调用执行以上各个模块 对应的操作。 [0099] [0100] For the specific definition of the intention recognition device based on the convolutional neural network, reference may be made to the above definition of the intention recognition method based on the convolutional neural network, which will not be repeated here. Each module in the above intention recognition device based on a convolutional neural network may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
[0101] 在一个实施例中, 提供了一种计算机设备, 该计算机设备可以是服务器, 其内 部结构图可以如图 9所示。 该计算机设备包括通过系统总线连接的处理器、 存储 器、 网络接口和数据库。 其中, 该计算机设备的处理器用于提供计算和控制能 力。 该计算机设备的存储器包括可读存储介质、 内存储器。 该可读存储介质存 储有操作系统、 计算机可读指令和数据库。 该内存储器为可读存储介质中的操 作系统和计算机可读指令的运行提供环境。 该计算机设备的数据库用于存储基 于卷积神经网络的意图识别方法中涉及到的数据。 该计算机设备的网络接口用 于与外部的终端通过网络连接通信。 该计算机可读指令被处理器执行时以实现 一种基于卷积神经网络的意图识别方法。 本实施例所提供的可读存储介质包括 非易失性可读存储介质和易失性可读存储介质。 [0101] In an embodiment, a computer device is provided, the computer device may be a server, and an internal structure diagram thereof may be as shown in FIG. 9. The computer equipment includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a readable storage medium and internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for operating systems and computer-readable instructions in a readable storage medium. The database of the computer device is used to store the data involved in the intent recognition method based on the convolutional neural network. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer readable instructions are executed by the processor to implement a method of intent recognition based on convolutional neural networks. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
[0102] 在一个实施例中, 提供了一种计算机设备, 包括存储器、 处理器及存储在存储 器上并可在处理器上运行的计算机可读指令, 处理器执行计算机可读指令时实 现上述实施例中基于卷积神经网络的意图识别方法的步骤, 例如图 2所示的步骤 101至步骤 104。 或者, 处理器执行计算机可读指令时实现上述实施例中基于卷 积神经网络的意图识别装置的各模块 /单元的功能, 例如图 6所示模块 501至模块 5 04的功能。 为避免重复, 这里不再赘述。 [0102] In an embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, and the processor implements the computer-readable instructions to implement the foregoing implementation In the example, the steps of the intent recognition method based on the convolutional neural network are, for example, steps 101 to 104 shown in FIG. 2. Alternatively, when the processor executes computer-readable instructions, the functions of the modules/units of the intention recognition device based on the convolutional neural network in the above embodiments are implemented, for example, the functions of module 501 to module 504 shown in FIG. 6. In order to avoid repetition, I will not repeat them here.
[0103] 在一个实施例中, 提供了一种计算机可读存储介质, 该一个或多个存储有计算 机可读指令的可读存储介质, 计算机可读指令被一个或多个处理器执行时, 使 得一个或多个处理器执行计算机可读指令时实现上述方法实施例中基于卷积神 经网络的意图识别方法的步骤, 或者, 该一个或多个存储有计算机可读指令的 可读存储介质, 计算机可读指令被一个或多个处理器执行时, 使得一个或多个 处理器执行计算机可读指令时实现上述装置实施例中基于卷积神经网络的意图 识别装置中各模块 /单元的功能。 为避免重复, 这里不再赘述。 本实施例所提供 的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。 [0103] In one embodiment, a computer-readable storage medium is provided, the one or more readable storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, Enabling one or more processors to execute the steps of the computer-readable instructions to implement the steps of the intent recognition method based on the convolutional neural network in the foregoing method embodiments, or the one or more readable storage media storing computer-readable instructions, When the computer-readable instructions are executed by one or more processors, when one or more processors execute the computer-readable instructions, the intention based on the convolutional neural network in the foregoing device embodiments is realized Identify the function of each module/unit in the device. In order to avoid repetition, I will not repeat them here. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
[0104] 本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可 以通过计算机可读指令来指令相关的硬件来完成, 所述的计算机可读指令可存 储于一计算机可读取存储介质中, 该计算机可读指令在执行时, 可包括如上述 各方法的实施例的流程。 其中, 本申请所提供的各实施例中所使用的对存储器 、 存储、 数据库或其它介质的任何引用, 均可包括非易失性和 /或易失性存储器 。 存储器可包括只读存储器 (ROM) 、 可编程 ROM (PROM) 、 电可编程 ROM (EPROM) 、 电可擦除可编程 ROM (EEPROM) 或闪存。 易失性存储器可包括 随机存取存储器 (RAM) 或者外部高速缓冲存储器。 作为说明而非局限, RAM 以多种形式可得, 诸如静态 RAM (SRAM) 、 动态 RAM (DRAM) 、 同步 DRA M (SDRAM) 、 双数据率 SDRAM (DDRSDRAM) 、 增强型 SDRAM (ESDRA M) 、 同步链路 (Synchlink) DRAM (SLDRAM) 、 存储器总线 (Rambus) 直 接 RAM (RDRAM) 、 直接存储器总线动态 RAM (DRDRAM) 、 以及存储器总 线动态 RAM (RDRAM) 等。 [0104] A person of ordinary skill in the art may understand that all or part of the process in the method of the foregoing embodiments may be completed by instructing relevant hardware through computer-readable instructions. The computer-readable instructions may be stored in a computer. In reading the storage medium, when the computer-readable instructions are executed, the processes of the foregoing method embodiments may be included. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. The memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRA M), Synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0105] 所属领域的技术人员可以清楚地了解到, 为了描述的方便和简洁, 仅以上述各 功能单元、 模块的划分进行举例说明, 实际应用中, 可以根据需要而将上述功 能分配由不同的功能单元、 模块完成, 即将所述装置的内部结构划分成不同的 功能单元或模块, 以完成以上描述的全部或者部分功能。 [0105] Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the above-mentioned division of each functional unit and module is used as an example for illustration. In practical applications, the above-mentioned functions may be allocated by different The functional units and modules are completed, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.
[0106] 以上所述实施例仅用以说明本申请的技术方案, 而非对其限制; 尽管参照前述 实施例对本申请进行了详细的说明, 本领域的普通技术人员应当理解: 其依然 可以对前述各实施例所记载的技术方案进行修改, 或者对其中部分技术特征进 行等同替换; 而这些修改或者替换, 并不使相应技术方案的本质脱离本申请各 实施例技术方案的精神和范围, 均应包含在本申请的保护范围之内。 [0106] The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate from the spirit and scope of the technical solutions of the embodiments of the present application. It should be included in the scope of protection of this application.

Claims

权利要求书 Claims
[权利要求 1] 一种基于卷积神经网络的意图识别方法, 其特征在于, 包括: [Claim 1] An intention recognition method based on a convolutional neural network, characterized in that it includes:
获取待识别意图的目标文本; Obtain the target text of the intention to be identified;
对所述目标文本进行向量化处理, 得到目标向量; 将所述目标向量作为输入投入至预先训练好的卷积神经网络, 得到所 述卷积神经网络输出的目标结果向量, 所述目标结果向量中的各个元 素分别为各个预设用户意图对应的第一概率值, 第一概率值表征了所 述目标文本属于对应的预设用户意图的概率; Vectorizing the target text to obtain a target vector; using the target vector as input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, the target result vector Each element in is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;
将第一概率值最高的预设用户意图确定为所述目标文本对应的目标用 户意图。 The preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.
[权利要求 2] 根据权利要求 1所述的基于卷积神经网络的意图识别方法, 其特征在 于, 所述卷积神经网络通过以下步骤预先训练好: 分别收集属于各个预设用户意图的话术文本; [Claim 2] The intent recognition method based on a convolutional neural network according to claim 1, characterized in that the convolutional neural network is pre-trained by the following steps: Collecting utterance texts belonging to the intention of each preset user respectively ;
对收集到的话术文本分别进行向量化处理, 得到各个话术文本对应的 样本向量; Vectorize the collected utterance text separately to obtain the sample vector corresponding to each utterance text;
针对每个预设用户意图, 将所述预设用户意图对应的样本向量的标记 值记为 1, 其它样本向量的标记值记为 0, 所述其它样本向量是指除所 述预设用户意图对应的样本向量之外的样本向量; 针对每个预设用户意图, 将所有样本向量作为输入投入至卷积神经网 络进行训练, 得到样本结果向量, 所述样本结果向量由各个元素组成 , 所述各个元素分别表征了所述样本向量对应的话术文本分别属于各 个预设用户意图的概率; For each preset user intent, the label value of the sample vector corresponding to the preset user intent is recorded as 1, and the label value of the other sample vectors is recorded as 0. The other sample vector refers to the preset user intent A sample vector other than the corresponding sample vector; for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector, and the sample result vector is composed of various elements, the Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;
针对每个预设用户意图, 以输出的各个样本结果向量作为调整目标, 调整所述卷积神经网络的参数, 以最小化得到的所述各个样本结果向 量与各个样本向量对应的标记值之间的误差; 若所述各个样本结果向量与各个样本向量对应的标记值之间的误差满 足预设的训练终止条件, 则确定所述卷积神经网络已训练好。 For each preset user's intention, the output of each sample result vector is used as the adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the difference between the obtained sample result vector and the corresponding tag value of each sample vector Error; if the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset training termination condition, it is determined that the convolutional neural network has been trained well.
[权利要求 3] 根据权利要求 1所述的基于卷积神经网络的意图识别方法, 其特征在 于, 所述将所述目标向量作为输入投入至预先训练好的卷积神经网络 , 得到所述卷积神经网络输出的目标结果向量包括: [Claim 3] The intent recognition method based on convolutional neural network according to claim 1, characterized in that Therefore, the inputting the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network includes:
在将所述目标向量输入所述卷积神经网络之后, 针对所述卷积神经网 络中卷积层输出的特征图, 分别使用所述卷积神经网络中的卷积窗口 提取所述特征图上的特征和执行池化操作, 再将池化操作得到的输出 向量输入所述卷积神经网络中的全连接层, 得到所述目标结果向量。 After inputting the target vector to the convolutional neural network, for the feature map output by the convolutional layer in the convolutional neural network, the convolutional window in the convolutional neural network is used to extract the feature map. And the pooling operation, and then input the output vector obtained from the pooling operation to the fully connected layer in the convolutional neural network to obtain the target result vector.
[权利要求 4] 根据权利要求 1所述的基于卷积神经网络的意图识别方法, 其特征在 于, 所述对所述目标文本进行向量化处理, 得到目标向量包括: 采用预设的字典将所述目标文本中各个字词转化为各个一维行向量, 所述字典记录了字词与各个一维行向量之间的对应关系; [Claim 4] The intent recognition method based on convolutional neural network according to claim 1, characterized in that the vectorizing the target text to obtain the target vector includes: Each word in the target text is converted into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;
按照所述目标文本中各个字词的次序将所述各个一维行向量组成一个 二维向量作为目标向量。 According to the order of each word in the target text, the one-dimensional line vectors are combined into a two-dimensional vector as the target vector.
[权利要求 5] 根据权利要求 1至 4中任一项所述的基于卷积神经网络的意图识别方法 , 其特征在于, 在对所述目标文本进行向量化处理, 得到目标向量之 前, 还包括: [Claim 5] The intent recognition method based on a convolutional neural network according to any one of claims 1 to 4, characterized in that before vectorizing the target text to obtain a target vector, the method further includes :
删除所述目标文本中的指定文本, 所述指定文本至少包括停用词或标 点符号; Delete the specified text in the target text, the specified text includes at least stop words or punctuation marks;
对删除指定文本后的所述目标文本进行分词处理, 得到所述目标文本 中的各个字词。 Perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.
[权利要求 6] 一种基于卷积神经网络的意图识别装置, 其特征在于, 包括: [Claim 6] An intention recognition device based on a convolutional neural network, characterized in that it includes:
目标文本获取模块, 用于获取待识别意图的目标文本; A target text acquisition module, used to acquire the target text of the intention to be identified;
文本向量化模块, 用于对所述目标文本进行向量化处理, 得到目标向 量; A text vectorization module, which is used to vectorize the target text to obtain a target vector;
网络识别模块, 用于将所述目标向量作为输入投入至预先训练好的卷 积神经网络, 得到所述卷积神经网络输出的目标结果向量, 所述目标 结果向量中的各个元素分别为各个预设用户意图对应的第一概率值, 第一概率值表征了所述目标文本属于对应的预设用户意图的概率; 意图确定模块, 用于将第一概率值最高的预设用户意图确定为所述目 标文本对应的目标用户意图。 A network recognition module, used to input the target vector as an input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, and each element in the target result vector is a It is assumed that the first probability value corresponding to the user's intention represents the probability that the target text belongs to the corresponding preset user's intention; the intention determination module is used to determine the preset user's intention with the highest first probability value as the Description Target user intent corresponding to the target text.
[权利要求 7] 根据权利要求 6所述的基于卷积神经网络的意图识别装置, 其特征在 于, 所述卷积神经网络通过以下模块预先训练好: 文本收集模块, 用于分别收集属于各个预设用户意图的话术文本; 样本向量化模块, 用于对收集到的话术文本分别进行向量化处理, 得 到各个话术文本对应的样本向量; [Claim 7] The intent recognition device based on the convolutional neural network according to claim 6, characterized in that the convolutional neural network is pre-trained by the following modules: a text collection module for collecting separately Suppose the utterance text intended by the user; the sample vectorization module is used to vectorize the collected utterance text separately to obtain the sample vector corresponding to each utterance text;
样本标记模块, 用于针对每个预设用户意图, 将所述预设用户意图对 应的样本向量的标记值记为 1, 其它样本向量的标记值记为 0, 所述其 它样本向量是指除所述预设用户意图对应的样本向量之外的样本向量 样本投入模块, 用于针对每个预设用户意图, 将所有样本向量作为输 入投入至卷积神经网络进行训练, 得到样本结果向量, 所述样本结果 向量由各个元素组成, 所述各个元素分别表征了所述样本向量对应的 话术文本分别属于各个预设用户意图的概率; The sample labeling module is configured to record the label value of the sample vector corresponding to the preset user intention as 1 and the label value of other sample vectors as 0 for each preset user intention, and the other sample vector refers to A sample vector sample input module other than the sample vector corresponding to the preset user intent is used to input all sample vectors as input to the convolutional neural network for training for each preset user intent to obtain a sample result vector. The sample result vector is composed of various elements, and each element represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;
网络参数调整模块, 用于针对每个预设用户意图, 以输出的各个样本 结果向量作为调整目标, 调整所述卷积神经网络的参数, 以最小化得 到的所述各个样本结果向量与各个样本向量对应的标记值之间的误差 训练完成确定模块, 用于若所述各个样本结果向量与各个样本向量对 应的标记值之间的误差满足预设的训练终止条件, 则确定所述卷积神 经网络已训练好。 The network parameter adjustment module is used to adjust the parameters of the convolutional neural network with the output of each sample result vector as the adjustment target for each preset user intent to minimize the obtained sample result vector and each sample An error training completion determination module between the marker values corresponding to the vector is used to determine the convolutional nerve if the error between the respective sample result vector and the marker value corresponding to each sample vector satisfies the preset training termination condition The network has been trained.
[权利要求 8] 根据权利要求 6所述的基于卷积神经网络的意图识别装置, 其特征在 于, 所述网络识别模块包括: [Claim 8] The intention recognition device based on the convolutional neural network according to claim 6, wherein the network recognition module includes:
神经网络识别单元, 用于在将所述目标向量输入所述卷积神经网络之 后, 针对所述卷积神经网络中卷积层输出的特征图, 分别使用所述卷 积神经网络中的卷积窗口提取所述特征图上的特征和执行池化操作, 再将池化操作得到的输出向量输入所述卷积神经网络中的全连接层, 得到所述目标结果向量。 The neural network recognition unit is configured to use the convolutions in the convolutional neural network for the feature maps output by the convolutional layer in the convolutional neural network after inputting the target vector into the convolutional neural network The window extracts the features on the feature map and performs the pooling operation, and then inputs the output vector obtained by the pooling operation into the fully connected layer in the convolutional neural network to obtain the target result vector.
[权利要求 9] 根据权利要求 6所述的基于卷积神经网络的意图识别装置, 其特征在 于, 所述文本向量化模块包括: [Claim 9] The intention recognition device based on the convolutional neural network according to claim 6, characterized in that the text vectorization module includes:
一维向量转化单元, 用于采用预设的字典将所述目标文本中各个字词 转化为各个一维行向量, 所述字典记录了字词与各个一维行向量之间 的对应关系; A one-dimensional vector conversion unit, used to convert each word in the target text into each one-dimensional line vector using a preset dictionary, and the dictionary records the correspondence between the words and each one-dimensional line vector;
二维向量组成单元, 用于按照所述目标文本中各个字词的次序将所述 各个一维行向量组成一个二维向量作为目标向量。 The two-dimensional vector composition unit is configured to form each one-dimensional line vector into a two-dimensional vector as the target vector according to the order of each word in the target text.
[权利要求 10] 根据权利要求 6至 9中任一项所述的基于卷积神经网络的意图识别装置 , 其特征在于, 所述基于卷积神经网络的意图识别装置还包括: 指定文本删除模块, 用于删除所述目标文本中的指定文本, 所述指定 文本至少包括停用词或标点符号; [Claim 10] The intention recognition device based on a convolutional neural network according to any one of claims 6 to 9, wherein the intention recognition device based on a convolutional neural network further includes: a designated text deletion module For deleting the specified text in the target text, where the specified text includes at least stop words or punctuation marks;
分词处理模块, 用于对删除指定文本后的所述目标文本进行分词处理 , 得到所述目标文本中的各个字词。 The word segmentation processing module is configured to perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.
[权利要求 11] 一种计算机设备, 包括存储器、 处理器以及存储在所述存储器中并可 在所述处理器上运行的计算机可读指令, 其特征在于, 所述处理器执 行所述计算机可读指令时实现如下步骤: [Claim 11] A computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, characterized in that the processor executes the computer The following steps are realized when reading instructions:
获取待识别意图的目标文本; Obtain the target text of the intention to be identified;
对所述目标文本进行向量化处理, 得到目标向量; 将所述目标向量作为输入投入至预先训练好的卷积神经网络, 得到所 述卷积神经网络输出的目标结果向量, 所述目标结果向量中的各个元 素分别为各个预设用户意图对应的第一概率值, 第一概率值表征了所 述目标文本属于对应的预设用户意图的概率; Vectorizing the target text to obtain a target vector; using the target vector as input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, the target result vector Each element in is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;
将第一概率值最高的预设用户意图确定为所述目标文本对应的目标用 户意图。 The preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.
[权利要求 12] 根据权利要求 11所述的计算机设备, 其特征在于, 所述卷积神经网络 通过以下步骤预先训练好: [Claim 12] The computer device according to claim 11, wherein the convolutional neural network is pre-trained by the following steps:
分别收集属于各个预设用户意图的话术文本; Collect the utterance text that belongs to the intention of each preset user separately;
对收集到的话术文本分别进行向量化处理, 得到各个话术文本对应的 样本向量; Vectorize the collected utterance text separately to obtain the corresponding Sample vector
针对每个预设用户意图, 将所述预设用户意图对应的样本向量的标记 值记为 1, 其它样本向量的标记值记为 0, 所述其它样本向量是指除所 述预设用户意图对应的样本向量之外的样本向量; 针对每个预设用户意图, 将所有样本向量作为输入投入至卷积神经网 络进行训练, 得到样本结果向量, 所述样本结果向量由各个元素组成 , 所述各个元素分别表征了所述样本向量对应的话术文本分别属于各 个预设用户意图的概率; For each preset user intent, the label value of the sample vector corresponding to the preset user intent is recorded as 1, and the label value of the other sample vectors is recorded as 0. The other sample vector refers to the preset user intent A sample vector other than the corresponding sample vector; for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector, and the sample result vector is composed of various elements, the Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user;
针对每个预设用户意图, 以输出的各个样本结果向量作为调整目标, 调整所述卷积神经网络的参数, 以最小化得到的所述各个样本结果向 量与各个样本向量对应的标记值之间的误差; 若所述各个样本结果向量与各个样本向量对应的标记值之间的误差满 足预设的训练终止条件, 则确定所述卷积神经网络已训练好。 For each preset user's intention, the output of each sample result vector is used as the adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the difference between the obtained sample result vector and the corresponding tag value of each sample vector Error; if the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset training termination condition, it is determined that the convolutional neural network has been trained well.
[权利要求 13] 根据权利要求 11所述的计算机设备, 其特征在于, 所述将所述目标向 量作为输入投入至预先训练好的卷积神经网络, 得到所述卷积神经网 络输出的目标结果向量包括: [Claim 13] The computer device according to claim 11, wherein the target vector is input as an input to a pre-trained convolutional neural network to obtain a target result output by the convolutional neural network The vector includes:
在将所述目标向量输入所述卷积神经网络之后, 针对所述卷积神经网 络中卷积层输出的特征图, 分别使用所述卷积神经网络中的卷积窗口 提取所述特征图上的特征和执行池化操作, 再将池化操作得到的输出 向量输入所述卷积神经网络中的全连接层, 得到所述目标结果向量。 After inputting the target vector to the convolutional neural network, for the feature map output by the convolutional layer in the convolutional neural network, the convolutional window in the convolutional neural network is used to extract the feature map. And the pooling operation, and then input the output vector obtained from the pooling operation to the fully connected layer in the convolutional neural network to obtain the target result vector.
[权利要求 14] 根据权利要求 11所述的计算机设备, 其特征在于, 所述对所述目标文 本进行向量化处理, 得到目标向量包括: [Claim 14] The computer device according to claim 11, wherein the vectorizing the target text to obtain a target vector includes:
采用预设的字典将所述目标文本中各个字词转化为各个一维行向量, 所述字典记录了字词与各个一维行向量之间的对应关系; A preset dictionary is used to convert each word in the target text into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;
按照所述目标文本中各个字词的次序将所述各个一维行向量组成一个 二维向量作为目标向量。 According to the order of each word in the target text, the one-dimensional line vectors are combined into a two-dimensional vector as the target vector.
[权利要求 15] 根据权利要求 11至 14中任一项所述的计算机设备, 其特征在于, 在对 所述目标文本进行向量化处理, 得到目标向量之前, 所述处理器执行 所述计算机可读指令时还实现如下步骤: [Claim 15] The computer device according to any one of claims 11 to 14, characterized in that, before vectorizing the target text to obtain a target vector, the processor executes The computer-readable instructions also implement the following steps:
删除所述目标文本中的指定文本, 所述指定文本至少包括停用词或标 点符号; Delete the specified text in the target text, the specified text includes at least stop words or punctuation marks;
对删除指定文本后的所述目标文本进行分词处理, 得到所述目标文本 中的各个字词。 Perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.
[权利要求 16] —个或多个存储有计算机可读指令的可读存储介质, 其特征在于, 所 述计算机可读指令被一个或多个处理器执行时, 使得所述一个或多个 处理器执行如下步骤: [Claim 16] One or more readable storage media storing computer-readable instructions, characterized in that, when the computer-readable instructions are executed by one or more processors, the one or more processes The controller performs the following steps:
获取待识别意图的目标文本; Obtain the target text of the intention to be identified;
对所述目标文本进行向量化处理, 得到目标向量; 将所述目标向量作为输入投入至预先训练好的卷积神经网络, 得到所 述卷积神经网络输出的目标结果向量, 所述目标结果向量中的各个元 素分别为各个预设用户意图对应的第一概率值, 第一概率值表征了所 述目标文本属于对应的预设用户意图的概率; Vectorizing the target text to obtain a target vector; using the target vector as input to a pre-trained convolutional neural network to obtain a target result vector output by the convolutional neural network, the target result vector Each element in is a first probability value corresponding to each preset user intention, and the first probability value represents the probability that the target text belongs to the corresponding preset user intention;
将第一概率值最高的预设用户意图确定为所述目标文本对应的目标用 户意图。 The preset user intention with the highest first probability value is determined as the target user intention corresponding to the target text.
[权利要求 17] 根据权利要求 16所述的可读存储介质, 其特征在于, 所述卷积神经网 络通过以下步骤预先训练好: [Claim 17] The readable storage medium according to claim 16, wherein the convolutional neural network is pre-trained by the following steps:
分别收集属于各个预设用户意图的话术文本; Collect the utterance text that belongs to the intention of each preset user separately;
对收集到的话术文本分别进行向量化处理, 得到各个话术文本对应的 样本向量; Vectorize the collected utterance text separately to obtain the sample vector corresponding to each utterance text;
针对每个预设用户意图, 将所述预设用户意图对应的样本向量的标记 值记为 1, 其它样本向量的标记值记为 0, 所述其它样本向量是指除所 述预设用户意图对应的样本向量之外的样本向量; 针对每个预设用户意图, 将所有样本向量作为输入投入至卷积神经网 络进行训练, 得到样本结果向量, 所述样本结果向量由各个元素组成 , 所述各个元素分别表征了所述样本向量对应的话术文本分别属于各 个预设用户意图的概率; 针对每个预设用户意图, 以输出的各个样本结果向量作为调整目标, 调整所述卷积神经网络的参数, 以最小化得到的所述各个样本结果向 量与各个样本向量对应的标记值之间的误差; 若所述各个样本结果向量与各个样本向量对应的标记值之间的误差满 足预设的训练终止条件, 则确定所述卷积神经网络已训练好。 For each preset user intent, the label value of the sample vector corresponding to the preset user intent is recorded as 1, and the label value of the other sample vectors is recorded as 0. The other sample vector refers to the preset user intent A sample vector other than the corresponding sample vector; for each preset user intention, all sample vectors are input as input to the convolutional neural network for training to obtain a sample result vector, and the sample result vector is composed of various elements, the Each element separately represents the probability that the utterance text corresponding to the sample vector belongs to the intention of each preset user; For each preset user's intention, the output of each sample result vector is used as the adjustment target, and the parameters of the convolutional neural network are adjusted to minimize the difference between the obtained sample result vector and the corresponding tag value of each sample vector Error; if the error between each sample result vector and the label value corresponding to each sample vector satisfies the preset training termination condition, it is determined that the convolutional neural network has been trained well.
[权利要求 18] 根据权利要求 16所述的可读存储介质, 其特征在于, 所述将所述目标 向量作为输入投入至预先训练好的卷积神经网络, 得到所述卷积神经 网络输出的目标结果向量包括: [Claim 18] The readable storage medium according to claim 16, wherein the target vector is input as an input to a pre-trained convolutional neural network to obtain the output of the convolutional neural network Target result vectors include:
在将所述目标向量输入所述卷积神经网络之后, 针对所述卷积神经网 络中卷积层输出的特征图, 分别使用所述卷积神经网络中的卷积窗口 提取所述特征图上的特征和执行池化操作, 再将池化操作得到的输出 向量输入所述卷积神经网络中的全连接层, 得到所述目标结果向量。 After inputting the target vector to the convolutional neural network, for the feature map output by the convolutional layer in the convolutional neural network, the convolutional window in the convolutional neural network is used to extract the feature map. And the pooling operation, and then input the output vector obtained from the pooling operation to the fully connected layer in the convolutional neural network to obtain the target result vector.
[权利要求 19] 根据权利要求 16所述的可读存储介质, 其特征在于, 所述对所述目标 文本进行向量化处理, 得到目标向量包括: [Claim 19] The readable storage medium according to claim 16, wherein the vectorizing the target text to obtain a target vector includes:
采用预设的字典将所述目标文本中各个字词转化为各个一维行向量, 所述字典记录了字词与各个一维行向量之间的对应关系; A preset dictionary is used to convert each word in the target text into each one-dimensional line vector, and the dictionary records the correspondence between the words and each one-dimensional line vector;
按照所述目标文本中各个字词的次序将所述各个一维行向量组成一个 二维向量作为目标向量。 According to the order of each word in the target text, the one-dimensional line vectors are combined into a two-dimensional vector as the target vector.
[权利要求 20] 根据权利要求 16至 19中任一项所述的可读存储介质, 其特征在于, 在 对所述目标文本进行向量化处理, 得到目标向量之前, 所述计算机可 读指令被一个或多个处理器执行时, 使得所述一个或多个处理器还执 行如下步骤: [Claim 20] The readable storage medium according to any one of claims 16 to 19, characterized in that, before vectorizing the target text to obtain a target vector, the computer-readable instruction is When executed by one or more processors, the one or more processors further execute the following steps:
删除所述目标文本中的指定文本, 所述指定文本至少包括停用词或标 点符号; Delete the specified text in the target text, the specified text includes at least stop words or punctuation marks;
对删除指定文本后的所述目标文本进行分词处理, 得到所述目标文本 中的各个字词。 Perform word segmentation processing on the target text after deleting the specified text to obtain each word in the target text.
PCT/CN2019/117097 2019-01-04 2019-11-11 Convolutional neural network-based intention recognition method, apparatus, device, and medium WO2020140612A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910007860.0 2019-01-04
CN201910007860.0A CN109829153A (en) 2019-01-04 2019-01-04 Intension recognizing method, device, equipment and medium based on convolutional neural networks

Publications (1)

Publication Number Publication Date
WO2020140612A1 true WO2020140612A1 (en) 2020-07-09

Family

ID=66861465

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117097 WO2020140612A1 (en) 2019-01-04 2019-11-11 Convolutional neural network-based intention recognition method, apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN109829153A (en)
WO (1) WO2020140612A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986653A (en) * 2020-08-06 2020-11-24 杭州海康威视数字技术股份有限公司 Voice intention recognition method, device and equipment
CN112069311A (en) * 2020-08-04 2020-12-11 北京声智科技有限公司 Text extraction method, device, equipment and medium
CN114090740A (en) * 2021-11-19 2022-02-25 北京有竹居网络技术有限公司 Intention recognition method and device, readable medium and electronic equipment
CN114301864A (en) * 2020-08-14 2022-04-08 腾讯科技(深圳)有限公司 Object identification method, device, storage medium and server
CN115033676A (en) * 2022-06-22 2022-09-09 支付宝(杭州)信息技术有限公司 Intention recognition model training and user intention recognition method and device
CN116186585A (en) * 2023-02-28 2023-05-30 广州朝辉智能科技有限公司 User behavior intention mining method and device based on big data analysis
WO2024001809A1 (en) * 2022-06-28 2024-01-04 华为技术有限公司 Recommendation method and apparatus, and electronic device

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829153A (en) * 2019-01-04 2019-05-31 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and medium based on convolutional neural networks
CN110263139A (en) * 2019-06-10 2019-09-20 湖北亿咖通科技有限公司 Vehicle, vehicle device equipment and its text intension recognizing method neural network based
CN110390108B (en) * 2019-07-29 2023-11-21 中国工商银行股份有限公司 Task type interaction method and system based on deep reinforcement learning
CN110689878B (en) * 2019-10-11 2020-07-28 浙江百应科技有限公司 Intelligent voice conversation intention recognition method based on X L Net
CN110909543A (en) * 2019-11-15 2020-03-24 广州洪荒智能科技有限公司 Intention recognition method, device, equipment and medium
CN111091832B (en) * 2019-11-28 2022-12-30 秒针信息技术有限公司 Intention assessment method and system based on voice recognition
CN111161740A (en) * 2019-12-31 2020-05-15 中国建设银行股份有限公司 Intention recognition model training method, intention recognition method and related device
CN111259625B (en) * 2020-01-16 2023-06-27 平安科技(深圳)有限公司 Intention recognition method, device, equipment and computer readable storage medium
CN112287108B (en) * 2020-10-29 2022-08-16 四川长虹电器股份有限公司 Intention recognition optimization method in field of Internet of things
CN112269815A (en) * 2020-10-29 2021-01-26 维沃移动通信有限公司 Structured data processing method and device and electronic equipment
CN112820412B (en) * 2021-02-03 2024-03-08 东软集团股份有限公司 User information processing method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction
CN107346340A (en) * 2017-07-04 2017-11-14 北京奇艺世纪科技有限公司 A kind of user view recognition methods and system
CN108363690A (en) * 2018-02-08 2018-08-03 北京十三科技有限公司 Dialog semantics Intention Anticipation method based on neural network and learning training method
US20180293978A1 (en) * 2017-04-07 2018-10-11 Conduent Business Services, Llc Performing semantic analyses of user-generated textual and voice content
CN108885870A (en) * 2015-12-01 2018-11-23 流利说人工智能公司 For by combining speech to TEXT system with speech to intention system the system and method to realize voice user interface
CN109829153A (en) * 2019-01-04 2019-05-31 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and medium based on convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815244B (en) * 2015-11-30 2020-02-07 北京国双科技有限公司 Text vector representation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885870A (en) * 2015-12-01 2018-11-23 流利说人工智能公司 For by combining speech to TEXT system with speech to intention system the system and method to realize voice user interface
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction
US20180293978A1 (en) * 2017-04-07 2018-10-11 Conduent Business Services, Llc Performing semantic analyses of user-generated textual and voice content
CN107346340A (en) * 2017-07-04 2017-11-14 北京奇艺世纪科技有限公司 A kind of user view recognition methods and system
CN108363690A (en) * 2018-02-08 2018-08-03 北京十三科技有限公司 Dialog semantics Intention Anticipation method based on neural network and learning training method
CN109829153A (en) * 2019-01-04 2019-05-31 平安科技(深圳)有限公司 Intension recognizing method, device, equipment and medium based on convolutional neural networks

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069311A (en) * 2020-08-04 2020-12-11 北京声智科技有限公司 Text extraction method, device, equipment and medium
CN111986653A (en) * 2020-08-06 2020-11-24 杭州海康威视数字技术股份有限公司 Voice intention recognition method, device and equipment
CN114301864A (en) * 2020-08-14 2022-04-08 腾讯科技(深圳)有限公司 Object identification method, device, storage medium and server
CN114301864B (en) * 2020-08-14 2024-02-02 腾讯科技(深圳)有限公司 Object identification method, device, storage medium and server
CN114090740A (en) * 2021-11-19 2022-02-25 北京有竹居网络技术有限公司 Intention recognition method and device, readable medium and electronic equipment
CN114090740B (en) * 2021-11-19 2023-07-07 北京有竹居网络技术有限公司 Intention recognition method and device, readable medium and electronic equipment
CN115033676A (en) * 2022-06-22 2022-09-09 支付宝(杭州)信息技术有限公司 Intention recognition model training and user intention recognition method and device
CN115033676B (en) * 2022-06-22 2024-04-26 支付宝(杭州)信息技术有限公司 Intention recognition model training and user intention recognition method and device
WO2024001809A1 (en) * 2022-06-28 2024-01-04 华为技术有限公司 Recommendation method and apparatus, and electronic device
CN116186585A (en) * 2023-02-28 2023-05-30 广州朝辉智能科技有限公司 User behavior intention mining method and device based on big data analysis
CN116186585B (en) * 2023-02-28 2023-10-31 省广营销集团有限公司 User behavior intention mining method and device based on big data analysis

Also Published As

Publication number Publication date
CN109829153A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
WO2020140612A1 (en) Convolutional neural network-based intention recognition method, apparatus, device, and medium
WO2020143844A1 (en) Intent analysis method and apparatus, display terminal, and computer readable storage medium
WO2020237869A1 (en) Question intention recognition method and apparatus, computer device, and storage medium
WO2020119031A1 (en) Deep learning-based question and answer feedback method, device, apparatus, and storage medium
WO2020177230A1 (en) Medical data classification method and apparatus based on machine learning, and computer device and storage medium
WO2020147395A1 (en) Emotion-based text classification method and device, and computer apparatus
WO2020048264A1 (en) Method and apparatus for processing drug data, computer device, and storage medium
WO2021003819A1 (en) Man-machine dialog method and man-machine dialog apparatus based on knowledge graph
WO2021068321A1 (en) Information pushing method and apparatus based on human-computer interaction, and computer device
WO2020232877A1 (en) Question answer selection method and apparatus, computer device, and storage medium
WO2020147238A1 (en) Keyword determination method, automatic scoring method, apparatus and device, and medium
CN111709233B (en) Intelligent diagnosis guiding method and system based on multi-attention convolutional neural network
WO2021042503A1 (en) Information classification extraction method, apparatus, computer device and storage medium
CN108595695B (en) Data processing method, data processing device, computer equipment and storage medium
US9361891B1 (en) Method for converting speech to text, performing natural language processing on the text output, extracting data values and matching to an electronic ticket form
CN112036154B (en) Electronic medical record generation method and device based on inquiry dialogue and computer equipment
CN109190110A (en) A kind of training method of Named Entity Extraction Model, system and electronic equipment
JP2021089705A (en) Method and device for evaluating translation quality
WO2021151328A1 (en) Symptom data processing method and apparatus, and computer device and storage medium
WO2020181808A1 (en) Text punctuation prediction method and apparatus, and computer device and storage medium
CN110569500A (en) Text semantic recognition method and device, computer equipment and storage medium
WO2021139243A1 (en) Data processing method, device, and apparatus employing human-computer interaction, and storage medium
WO2021051598A1 (en) Text sentiment analysis model training method, apparatus and device, and readable storage medium
CN111694940A (en) User report generation method and terminal equipment
WO2021208444A1 (en) Method and apparatus for automatically generating electronic cases, a device, and a storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907844

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19907844

Country of ref document: EP

Kind code of ref document: A1