WO2021169453A1 - 用于文本处理的方法和装置 - Google Patents

用于文本处理的方法和装置 Download PDF

Info

Publication number
WO2021169453A1
WO2021169453A1 PCT/CN2020/132636 CN2020132636W WO2021169453A1 WO 2021169453 A1 WO2021169453 A1 WO 2021169453A1 CN 2020132636 W CN2020132636 W CN 2020132636W WO 2021169453 A1 WO2021169453 A1 WO 2021169453A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
text
result
question text
encoding
Prior art date
Application number
PCT/CN2020/132636
Other languages
English (en)
French (fr)
Inventor
彭爽
崔恒斌
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021169453A1 publication Critical patent/WO2021169453A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of this specification relate to the field of information technology, and more specifically, to methods, devices, computing devices, and machine-readable storage media for text processing.
  • an intelligent question answering system With the continuous development of various technologies such as machine learning, in order to facilitate users to obtain help, an intelligent question answering system has been developed.
  • the intelligent question answering system can realize man-machine dialogue, for example, the user asks questions to the intelligent question answering system, and the intelligent question answering system can answer the questions automatically. It can be seen that the intelligent question answering system not only provides great convenience, but also reduces the cost of answering questions manually. However, how to make the intelligent question answering system answer the user's questions accurately has become one of the problems that need to be solved.
  • an embodiment of the present specification provides a method for text processing, including: receiving a first text vector and a second text vector, wherein the first text vector is used to represent the text of a user question, and The second text vector is used to represent a candidate question text, the candidate question text is obtained from a knowledge base, and the knowledge base includes at least one existing question text and an answer text for each existing question text;
  • Use Recurrent Neural Network (RNN) and Convolutional Neural Network (Convolutional Neural Network, CNN) to encode the first text vector and the second text vector to obtain the A first encoding result and a second encoding result for the second text vector; based on the first encoding result and the second encoding result, determine the difference between the user question text and the candidate question text
  • RNN Recurrent Neural Network
  • CNN Convolutional Neural Network
  • the embodiment of the present specification provides an apparatus for text processing, including: a receiving unit, which receives a first text vector and a second text vector, wherein the first text vector is used to represent user questions. Sentence text, the second text vector is used to represent a candidate question text, the candidate question text is obtained from a knowledge base, and the knowledge base includes at least one existing question text and corresponding to each existing question text The answer text of the text; an encoding unit that uses RNN and CNN to encode the first text vector and the second text vector to obtain a first encoding result for the first text vector and for the second text vector A second encoding result of the text vector; a determining unit, which determines the similarity between the user question text and the candidate question text based on the first encoding result and the second encoding result, wherein The said similarity is used to determine the answer to the text of the user's question.
  • the embodiments of this specification provide a computing device, including: at least one processor; When the at least one processor is executed, the at least one processor implements the foregoing method.
  • the embodiments of the present specification provide a machine-readable storage medium that stores executable code that, when executed, causes a machine to perform the above-mentioned method.
  • Fig. 1 is a schematic flowchart of an operation process of an intelligent question answering system according to an embodiment.
  • Fig. 2 is a schematic flowchart of a method for text processing according to an embodiment.
  • Fig. 3A is a schematic structural diagram of a similarity model according to an embodiment.
  • Fig. 3B is a schematic structural diagram of a CNN encoder according to an embodiment.
  • Fig. 4 is a schematic block diagram of an apparatus for text processing according to an embodiment.
  • Fig. 5 is a hardware structure diagram of a computing device for text processing according to an embodiment.
  • the term “including” and its variations mean open terms, meaning “including but not limited to”.
  • the term “based on” means “based at least in part on.”
  • the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
  • the term “another embodiment” means “at least one other embodiment.”
  • the terms “first”, “second”, etc. may refer to different or the same objects. The following may include other definitions, whether explicit or implicit, unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.
  • the knowledge base can be expressed in the form of question-answer pairs. That is, the knowledge base may include at least one question text and answer texts corresponding to these question texts.
  • the knowledge base can be established manually offline. For example, extract question text based on large-scale dialogue data, and then manually supplement the corresponding answer text to build a knowledge base.
  • Question text in the knowledge base can include standard question text and extended question text.
  • the standard question text could be "How to repay Huabai”
  • the extended question text could be "How to repay Huabai", “What is Huabai's repayment method”, “How to repay Huabai”, etc. Wait.
  • the intelligent question answering system can retrieve the question text similar to the user’s question text in the knowledge base, and then output the user-specific question based on the question text with high similarity to the user’s question text Text reply.
  • the intelligent question answering system will be described below in conjunction with specific examples.
  • Fig. 1 is a schematic flowchart of an operation process of an intelligent question answering system according to an embodiment.
  • the intelligent question answering system can receive the text of the user's question.
  • the user's question can be received through text or voice.
  • voice mode the received voice can be converted into text form.
  • the intelligent question answering system can search in the knowledge base to obtain N candidate question texts associated with the user's question text, where N is a positive integer.
  • the knowledge base contains a large number of question texts (for example, hundreds of thousands).
  • the intelligent question answering system can first obtain N candidate question texts from the knowledge base through simple retrieval methods such as keyword matching, and then perform follow-ups based on the N candidate question texts Calculation of similarity. This process can also be referred to as recalling the candidate question text.
  • N can be preset according to actual needs, experience, etc.
  • the intelligent question answering system can vectorize the user question text and the candidate question text to obtain respective text vectors.
  • various applicable vectorization algorithms can be used to convert the user question text and the candidate question text into text vectors, such as a word embedding algorithm.
  • the similarity between the user question text and each candidate question text can be determined.
  • a response to the user's question text may be output.
  • each candidate question text can be sorted according to the similarity between the user question text and each candidate question text from high to low.
  • the candidate question text with the highest similarity and the corresponding answer text can be output together as a response to the user's question text.
  • the intelligent question answering system can further output the corresponding answer text based on the user's selection.
  • Fig. 1 is only an example, and in specific implementation, the intelligent question answering system can use other applicable processes to run.
  • the embodiments of the present specification provide a text processing technical solution, which can accurately determine the text similarity, thereby greatly improving the answer accuracy of the intelligent question answering system, and helping to improve the user experience. This will be described in detail below.
  • Fig. 2 is a schematic flowchart of a method for text processing according to an embodiment.
  • step 202 the first text vector and the second text vector may be received.
  • the first text vector may represent the user question text.
  • the second text vector can represent the candidate question text.
  • the candidate question text may be one of the N candidate question texts recalled from the knowledge base at 104 in FIG. 1.
  • the user question text and the candidate question text can be processed by various applicable text vectorization algorithms to obtain the first text vector and the second text vector.
  • a word embedding algorithm can be used to convert the user question text and the candidate question text into the first text vector and the second text vector.
  • RNN and CNN may be used to encode the first text vector and the second text vector to obtain a first encoding result for the first text vector and a second encoding result for the second text vector.
  • the similarity between the user question text and the candidate question text may be determined based on the first encoding result and the second encoding result. As mentioned earlier, similarity can be used to determine the response to the text of the user's question.
  • RNN can be a neural network in which sequence data is input, recursive in the evolution direction of the sequence, and all nodes are connected in a chain.
  • CNN can be a feedforward neural network that includes convolution calculations and has a deep structure, and is one of the representative algorithms of deep learning.
  • CNN usually has characterization learning ability, and can classify the input information according to its hierarchical structure. Therefore, RNN is usually good for capturing the sequence information of the text, while CNN is usually good at capturing the key words and phrase information of the text.
  • both RNN and CNN are used to encode user question text and candidate question text. Combining the advantages of RNN and CNN can more fully capture the details of these two texts. The granular feature information can thereby more accurately determine the similarity between two texts, so that the intelligent question answering system can more accurately respond to the user's question, thereby greatly improving the user experience.
  • RNN can be implemented by various applicable structures.
  • RNN can include Bi-directional Long Short-Term Memory (BiLSTM) and Bi-directional Gated Recursive Unit (Bi-directional Long Short-Term Memory, BiLSTM). directional Gated Recurrent Unit, BiGRU) etc.
  • BiGRU may be used for encoding in step 204. In practice, it is found that BiGRU has better coding effect and higher efficiency.
  • RNN may be used for initial encoding, and then CNN may be used for secondary encoding on the basis of RNN encoding.
  • the RNN can capture the sequence feature information of the text, and the CNN can capture the word granularity feature information of the text, so as to achieve a deeper encoding of the text, thereby effectively improving the accuracy of similarity determination.
  • the RNN may be used to encode the first text vector to obtain the first initial encoding vector. Then, use CNN to encode the first initial coding vector to obtain the first secondary coding vector. In this way, the first encoding result may include the first initial encoding vector and the first secondary encoding vector.
  • the RNN can be used to encode the second text vector to obtain the second initial encoding vector. Then, use CNN to encode the second initial coding vector to obtain the second secondary coding vector.
  • the second encoding result may include the second initial encoding vector and the second secondary encoding result.
  • CNN can be implemented by various applicable structures.
  • CNN can be implemented based on the "Network in Network" idea.
  • a CNN may include at least two levels of convolutional layers.
  • the first level of convolutional layer may include at least one CNN unit with a convolution kernel size of 1
  • the first level of convolutional layer may include at least one CNN unit with a convolution kernel size of 1
  • the first level of convolutional layer may include at least one CNN unit with a convolution kernel size greater than one.
  • the convolution kernel size of each CNN unit is 1, while in the convolutional layer after the first-level convolutional layer, the convolution kernel size of each CNN unit can be greater than 1. .
  • This CNN structure can extract the word granularity feature information of the text more efficiently and conveniently. Moreover, due to the unique parameter sharing mechanism of the CNN convolution kernel, the amount of parameters can also be reduced, thereby reducing the computational complexity.
  • the similarity between the two texts can be determined based on the encoding result.
  • the interactive information between two texts can be fully captured based on the soft attention mechanism.
  • the first encoding result and the second encoding may be processed based on the soft attention mechanism to obtain the first interactive expression result for the user question text and the second interactive expression result for the candidate question text.
  • the first relative attention vector of the user question text relative to the candidate question text and the candidate question text relative to the user question text may be determined based on the first initial coding vector and the second initial coding vector.
  • the second relative attention vector may be determined based on the first initial coding vector and the second initial coding vector.
  • the soft attention weight matrix between the user question text and the candidate question text can be calculated. Then, the soft attention weight matrix and the second initial coding vector can be used to calculate the first relative attention vector. Similarly, the soft attention weight matrix and the first initial coding vector can be used to calculate the second relative attention vector.
  • the first relative attention vector, the first initial coding vector, and the first secondary coding vector can be processed to obtain the first interactive representation result.
  • a series of splicing operations may be performed on the first relative attention vector and the first initial coding vector to obtain the first splicing result. Then, average pooling (Avg Pooling) and maximum pooling (Max Pooling) can be performed on the first splicing result, respectively, to obtain the first pooling result. After that, the first pooling result and the first secondary encoding vector can be further spliced to obtain the first interactive representation result.
  • Avg Pooling average pooling
  • Max Pooling maximum pooling
  • the second relative attention vector, the second initial coding vector, and the second secondary coding vector can be processed through splicing and pooling operations to obtain the second interactive representation result.
  • a series of splicing operations may be performed on the second relative attention vector and the second initial coding vector to obtain the second splicing result. Then, average pooling and maximum pooling can be performed on the second splicing result to obtain the second pooling result. After that, the second pooling result and the second secondary encoding vector can be further spliced to obtain the second interactive representation result.
  • the similarity can be determined based on the first interaction representation result and the second interaction representation result.
  • the first interaction representation result and the second interaction representation result can be fused based on the threshold mechanism to obtain the fusion result.
  • the threshold mechanism can be used to filter out the parts that can be merged between the first interaction representation result and the second interaction representation result. It is understandable that the threshold mechanism is also used to filter out the parts that cannot be merged between the first interaction representation result and the second interaction representation result. In this way, the more important information in the two interactive representation results can be better integrated.
  • the probability that the user question text is similar to the candidate question text can be determined as the similarity between the two.
  • Fig. 3A is a schematic structural diagram of a similarity model according to an embodiment.
  • the similarity model 300A of FIG. 3A can be used to implement the process described in FIG. 2 above.
  • the similarity model 300A may include a coding layer, an interactive presentation layer, a fusion layer, and a fully connected layer.
  • the coding layer may include a BiGRU encoder and a CNN encoder.
  • the BiGRU encoder can respectively perform initial encoding on the first text vector and the second text vector input to the similarity model 300A.
  • the first text vector can be used to represent the user question text.
  • the first text vector may be obtained by vectorizing the user question text, and it may be a sequence vector.
  • the second text vector can be used to represent the candidate question text.
  • the candidate question text may be one of multiple question texts recalled from the knowledge base.
  • the second text vector may be obtained by vectorizing the candidate question text, and it may be a sequence vector.
  • the BiGRU encoder is used to encode the first text vector and the second text vector respectively to capture the sequence feature information of the user question text and the candidate question text.
  • the calculation process of the BiGRU encoder can be expressed by the following equation:
  • l a can represent the length of the user question text.
  • the length of the user question text can be expressed by the number of words in the user question text.
  • l b can represent the length of the candidate question text.
  • the length of the candidate question text can be represented by the number of characters in the candidate question text.
  • the CNN encoder can be used for secondary encoding.
  • the calculation process of the CNN encoder can be expressed by the following equation:
  • the CNN encoder can be implemented with a "Network in Network” structure, so as to better capture the word granularity feature information.
  • FIG. 3B is a schematic structural diagram of a CNN encoder according to an embodiment.
  • the CNN encoder 300B may include two levels of convolutional layers.
  • the first-level convolutional layer can process the initial coding vector from the BiGRU encoder (for example, the aforementioned first initial coding vector and the second initial coding vector).
  • the second-level convolutional layer can process the output results of the corresponding CNN units in the first-level convolutional layer.
  • Each CNN unit can be implemented using various applicable functions, such as the ReLU activation function.
  • the CNN encoder 300B may also include a pooling layer and a connection layer.
  • the pooling layer can perform average pooling and maximum pooling operations on the results from all levels of convolutional layers.
  • the connection layer can concatenate the pooling results to obtain the secondary code vector.
  • the CNN encoder may include other numbers of convolutional layers and other numbers.
  • CNN unit, and the CNN unit can have other different convolution kernel sizes, which is not limited in this specification.
  • the initial coding vector and the secondary coding vector obtained by the coding layer can be processed based on the soft attention mechanism, so as to fully capture the interactive information between the two texts, such as similar and dissimilar information.
  • the first initial coding vector and the second initial coding vector can be used to calculate the soft attention weight matrix e ij , which can be expressed by the following equation:
  • the soft attention weight matrix and the second secondary coding vector can be used to calculate the first relative attention vector of the user question text relative to the candidate question text
  • the soft attention weight matrix and the first secondary coding vector can be used to calculate the second relative attention vector of the candidate question text relative to the user question text
  • the first relative attention vector and the first initial coding vector may be spliced to obtain the first splicing result v a .
  • the second relative attention vector and the second initial coding vector can be spliced to obtain the second splicing result v b .
  • the pooling result and the secondary encoding vector can be further spliced to obtain the first interaction representation result O a and the second interaction representation result O b .
  • This process can be expressed by the following equation:
  • the first interaction representation result and the second interaction representation result can be fused.
  • the threshold mechanism can automatically learn the part that can be merged (or cannot be merged) in the first interaction representation result and the second interaction representation result, thereby effectively fusing the more important information in the two results.
  • the fusion process can be expressed by the following equation:
  • o′ a g(o a , o b ) ⁇ m(o a , o b )+(1-g(o a , o b )) ⁇ o a
  • o 'b g (o b , o a) ⁇ m (o b, o a) + (1-g (o b, o a)) ⁇ o b
  • P can correspond to O a ;
  • Q can correspond to O b ;
  • W f and b f can represent trainable vectors;
  • m can represent the fusion function of the fusion layer;
  • g can represent the implementation threshold mechanism threshold function;
  • computing O b 'process but the fusion of two input switching function and the location of the threshold function.
  • m out fusion results may represent O 'a and O' b's.
  • the threshold function can be implemented by various suitable algorithms.
  • the threshold function can be expressed by the following equation:
  • g(P, Q) ⁇ (w g ⁇ [P; Q; P ⁇ Q; PQ]+b g ).
  • w g can represent a trainable vector
  • b g can represent a trainable bias vector
  • can represent a sigmoid function
  • both BiGRU encoder and CNN encoder are used for encoding, which can effectively capture the respective sequence information and word granularity information of the two texts; in the interactive presentation layer, based on the soft attention mechanism , Can fully capture the interactive information of the two texts; at the fusion layer, based on the threshold mechanism, can advantageously fuse the more important part of the interactive representation result; therefore, through the similarity model of this embodiment, the text can be accurately obtained.
  • the similarity between the two so as to improve the accuracy of the answer to the user’s question.
  • the similarity model is a lightweight model, which is more efficient in implementation.
  • Fig. 4 is a schematic block diagram of an apparatus for text processing according to an embodiment.
  • the apparatus 400 may include a receiving module 402, an encoding module 404, and a determining module 406.
  • the receiving module 402 may receive the first text vector and the second text vector.
  • the first text vector can be used to represent the user question text
  • the second text vector can be used to represent the candidate question text.
  • the candidate question text may be obtained from a knowledge base, and the knowledge base may include at least one existing question text and an answer text for each existing question text.
  • the encoding module 404 may use RNN and CNN to encode the first text vector and the second text vector to obtain the first encoding result for the first text vector and the second encoding result for the second text vector.
  • the determining module 406 may determine the similarity between the user question text and the candidate question text based on the first encoding result and the second encoding result, where the similarity is used to determine a response to the user question text.
  • both RNN and CNN are used to encode user question text and candidate question text. Combining the advantages of RNN and CNN can more fully capture the fine-grained features of these two texts. Information, which can more accurately determine the similarity between two texts, so that the intelligent question answering system can more accurately respond to the user’s question, thereby greatly improving the user experience.
  • the first encoding result may include a first initial encoding vector and a first secondary encoding vector.
  • the second encoding result may include a second initial encoding vector and a second secondary encoding vector.
  • the encoding module 402 may use RNN to encode the first text vector to obtain the first initial coding vector, and use CNN to encode the first initial coding vector to obtain the first secondary coding vector.
  • the encoding module 402 may use RNN to encode the second text vector to obtain the second initial coding vector, and use CNN to encode the second initial coding vector to obtain the second secondary coding vector.
  • the determining module 406 may process the first encoding result and the second encoding result based on the soft attention mechanism to obtain the first interaction representation result for the user question text and the second interaction for the candidate question text Indicates the result.
  • the determining module 406 may determine the similarity based on the first interaction representation result and the second interaction representation result.
  • the determining module 406 may determine the first relative attention vector of the user question text relative to the candidate question text, and the candidate question text relative to the user based on the first initial coding vector and the second initial coding vector. The second relative attention vector of the question text.
  • the determining module 406 may process the first relative attention vector, the first initial coding vector, and the first secondary coding vector through a splicing operation and a pooling operation to obtain the first interactive representation result.
  • the determining module 406 may process the second relative attention vector, the second initial coding vector, and the second secondary coding vector through a splicing operation and a pooling operation, to obtain a second interactive representation result.
  • the determining module 406 may fuse the first interaction representation result and the second interaction representation result based on a threshold mechanism to obtain the fusion result, where the threshold mechanism is used to filter out the first interaction representation result and the second interaction representation result As a result, the part that can be merged between the two.
  • the determining module 406 may determine the probability that the user question text is similar to the candidate question text based on the fusion result, as the similarity.
  • the CNN may include at least two levels of convolutional layers.
  • the first level of convolutional layer includes at least one CNN unit with a convolution kernel size of 1.
  • Each level of convolutional layer after the layer includes at least one CNN unit with a convolution kernel size greater than one.
  • the RNN may include BiGRU.
  • Each module of the device 400 can execute the corresponding steps in the method embodiments of FIGS. 1 to 3B. Therefore, for brevity of description, the specific operations and functions of each unit of the device 400 will not be repeated here.
  • the foregoing apparatus 400 may be implemented by hardware, or by software, or by a combination of software and hardware.
  • the apparatus 400 when the apparatus 400 is implemented by software, it can be formed by reading the corresponding executable code in the memory (such as a non-volatile memory) into the memory by the processor of the device where it is located and running it.
  • the memory such as a non-volatile memory
  • Fig. 5 is a hardware structure diagram of a computing device for text processing according to an embodiment.
  • the computing device 500 may include at least one processor 502, a memory 504, a memory 506, and a communication interface 508, and the at least one processor 502, the memory 504, the memory 506, and the communication interface 508 are connected together via a bus 510.
  • At least one processor 502 executes at least one executable code stored or encoded in the memory 504 (ie, the above-mentioned element implemented in the form of software).
  • the executable code stored in the memory 504, when executed by the at least one processor 502, enables the computing device to implement the various processes described above in conjunction with FIGS. 1-3B.
  • the computing device 500 can be implemented in any suitable form in the art, for example, it includes, but is not limited to, a desktop computer, a laptop computer, a smart phone, a tablet computer, a consumer electronic device, a wearable smart device, and so on.
  • the embodiments of this specification also provide a machine-readable storage medium.
  • the machine-readable storage medium may store executable code, and when the executable code is executed by a machine, the machine implements the specific process of the method embodiment described above with reference to FIGS. 1-3B.
  • the machine-readable storage medium may include, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), and electrically erasable programmable read-only memory (Electrically-Erasable Programmable). Read-Only Memory, EEPROM), Static Random Access Memory (SRAM), hard disk, flash memory, etc.
  • RAM Random Access Memory
  • ROM read-only memory
  • EEPROM Electrically erasable programmable read-only memory
  • SRAM Static Random Access Memory
  • hard disk hard disk
  • flash memory etc.
  • the device structure described in the foregoing embodiments may be a physical structure or a logical structure. That is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented by multiple physical entities. Some components in independent devices are implemented together.

Abstract

一种用于文本处理的方法、装置、计算设备和机器可读存储介质。该方法包括:接收第一文本向量和第二文本向量,其中,第一文本向量用于表示用户问句文本,第二文本向量用于表示候选问句文本,候选问句文本是从知识库中获取的(202);利用RNN和CNN,对第一文本向量和第二文本向量进行编码,得到针对第一文本向量的第一编码结果和针对第二文本向量的第二编码结果(204);基于第一编码结果和第二编码结果,确定用户问句文本与候选问句文本之间的相似度,其中,相似度用于确定针对用户问句文本的答复(206)。

Description

用于文本处理的方法和装置 技术领域
本说明书的实施例涉及信息技术领域,更具体地,涉及用于文本处理的方法、装置、计算设备和机器可读存储介质。
背景技术
随着机器学习等各种技术的不断发展,为了方便用户获得帮助,已经开发了智能问答系统。智能问答系统可以实现人机对话,例如,用户向智能问答系统提出问题,智能问答系统能够自动地回答问题。可见,智能问答系统不仅提供了极大的便利性,而且降低了人工回答问题的成本。然而,如何使得智能问答系统准确地回答用户的问题,成为需要解决的问题之一。
发明内容
一方面,本说明书的实施例提供了一种用于文本处理的方法,包括:接收第一文本向量和第二文本向量,其中,所述第一文本向量用于表示用户问句文本,所述第二文本向量用于表示候选问句文本,所述候选问句文本是从知识库中获取的,所述知识库包括至少一个现有问句文本以及针对各个现有问句文本的答案文本;利用递归神经网络(Recurrent Neural Network,RNN)和卷积神经网络(Convolutional Neural Network,CNN),对所述第一文本向量和所述第二文本向量进行编码,得到针对所述第一文本向量的第一编码结果和针对所述第二文本向量的第二编码结果;基于所述第一编码结果和所述第二编码结果,确定所述用户问句文本与所述候选问句文本之间的相似度,其中,所述相似度用于确定针对所述用户问句文本的答复。
另一方面,本说明书的实施例提供了一种用于文本处理的装置,包括:接收单元,其接收第一文本向量和第二文本向量,其中,所述第一文本向量用于表示用户问句文本,所述第二文本向量用于表示候选问句文本,所述候选问句文本是从知识库中获取的,所述知识库包括至少一个现有问句文本以及针对各个现有问句文本的答案文本;编码单元,其利用RNN和CNN,对所述第一文本向量和所述第二文本向量进行编码,得到针对所述第一文本向量的第一编码结果和针对所述第二文本向量的第二编码结果;确定单元,其基于所述第一编码结果和所述第二编码结果,确定所述用户问句文本与所述候选问句文本之间的相似度,其中,所述相似度用于确定针对所述用户问句文本的答复。
另一方面,本说明书的实施例提供了一种计算设备,包括:至少一个处理器;与所述至少一个处理器进行通信的存储器,其上存储有可执行代码,所述可执行代码在被 所述至少一个处理器执行时使得所述至少一个处理器实现上述方法。
另一方面,本说明书的实施例提供了一种机器可读存储介质,其存储有可执行代码,所述可执行代码在被执行时使得机器执行上述方法。
附图说明
通过结合附图对本说明书的实施例的更详细的描述,本说明书的实施例的上述以及其它目的、特征和优势将变得更加明显,其中,在本说明书的实施例中,相同的附图标记通常代表相同的元素。
图1是根据一个实施例的智能问答系统的运行过程的示意性流程图。
图2是根据一个实施例的用于文本处理的方法的示意性流程图。
图3A是根据一个实施例的相似度模型的结构示意图。
图3B是根据一个实施例的CNN编码器的结构示意图。
图4是根据一个实施例的用于文本处理的装置的示意性框图。
图5是根据一个实施例的用于文本处理的计算设备的硬件结构图。
具体实施方式
现在将参考各实施例讨论本文描述的主题。应当理解的是,讨论这些实施例仅是为了使得本领域技术人员能够更好地理解并且实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离权利要求书的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个实施例可以根据需要,省略、替换或者添加各种过程或组件。
如本文中使用的,术语“包括”及其变型表示开放的术语,含义是“包括但不限于”。术语“基于”表示“至少部分地基于”。术语“一个实施例”和“一实施例”表示“至少一个实施例”。术语“另一个实施例”表示“至少一个其它实施例”。术语“第一”、“第二”等可以指代不同的或相同的对象。下面可以包括其它的定义,无论是明确的还是隐含的,除非上下文中明确地指明,否则一个术语的定义在整个说明书中是一致的。
随着机器学习、深度学习等人工智能技术的快速发展,智能问答系统应运而生。目前,大部分智能问答系统都是通过检索已有的知识库来回答问题。知识库可以表示为问句-答案对的形式。即,知识库可以包括至少一个问句文本以及这些问句文本所对应的答案文本。通常,知识库可以是通过离线人工的方式来建立的。例如,基于大规模的 对话语料来提取问句文本,然后人工手动补充相应的答案文本,从而建立知识库。
知识库中的问句文本可以包括标准问句文本和扩展问句文本。例如,标准问句文本可以是“花呗怎么还款”,扩展问句文本可以是“花呗如何还款”、“花呗的还款方式是什么”、“如何给花呗还款”等等。
在接收到用户问句文本时,智能问答系统可以在知识库中检索与用户问句文本相似的问句文本,然后基于与用户问句文本相似度高的问句文本,来输出针对用户问句文本的答复。为了便于理解,下面将结合具体示例来描述智能问答系统。
图1是根据一个实施例的智能问答系统的运行过程的示意性流程图。
如图1所示,在102处,智能问答系统可以接收用户问句文本。例如,可以通过文字或语音等方式接收用户的问句。在语音方式的情况下,可以将接收的语音转化为文本形式。
在104处,智能问答系统可以在知识库中进行检索,获取与用户问句文本相关联的N个候选问句文本,N为正整数。
通常,知识库包含大量的问句文本(例如,数十万)。为了降低相似度计算复杂度以及加快响应速度,智能问答系统可以通过关键词匹配等简单的检索方式,先从知识库中获取N个候选问句文本,然后基于N个候选问句文本来进行后续的相似度计算。该过程也可以称为召回候选问句文本。N可以根据实际需求、经验等来预先设定。
在106处,智能问答系统可以将用户问句文本和候选问句文本进行向量化,得到各自对应的文本向量。
此处,可以采用各种适用的向量化算法来将用户问句文本和候选问句文本转换为文本向量,比如词嵌入(Word embedding)算法等。
在108处,可以基于在106处得到的文本向量,确定用户问句文本与各个候选问句文本之间的相似度。
在110处,可以基于用户问句文本与各个候选问句文本之间的相似度,输出针对用户问句文本的答复。
例如,可以按照用户问句文本与各个候选问句文本之间的相似度从高到低,对各个候选问句文本进行排序。
在一种实现方式中,可以将相似度最高的候选问句文本以及对应的答案文本一起输出,作为针对用户问句文本的答复。
在另一种实现方式中,如果用户问句文本与各个候选问句文本之间的相似度低于某个预设阈值,则可以将排在前面的若干个候选问句文本输出,作为针对用户问句文本 的答复。这样,用户可以进一步在输出的候选问句文本中选择其所期望的问句文本。然后,智能问答系统可以基于用户的选择,进一步输出相应的答案文本。
可以理解的是,图1仅是一个示例,在具体实现时,智能问答系统可以采用其它适用的流程来运行。
从图1的示例可以看出,相似度确定对于智能问答系统能否准确地输出用户所期望的答复是至关重要的。鉴于此,本说明书的实施例提供了一种文本处理的技术方案,能够准确地确定文本相似度,从而能够极大地提高智能问答系统的回答准确率,有助于提升用户体验。下面将对此进行详细描述。
图2是根据一个实施例的用于文本处理的方法的示意性流程图。
如图2所示,在步骤202中,可以接收第一文本向量和第二文本向量。
第一文本向量可以表示用户问句文本。第二文本向量可以表示候选问句文本。例如,该候选问句文本可以是在图1的104处从知识库召回的N个候选问句文本之一。
可以通过各种适用的文本向量化算法对用户问句文本和候选问句文本进行处理,来得到第一文本向量和第二文本向量。比如,可以使用词嵌入算法将用户问句文本和候选问句文本转换为第一文本向量和第二文本向量。
在步骤204中,可以利用RNN和CNN,对第一文本向量和第二文本向量进行编码,得到针对第一文本向量的第一编码结果和针对第二文本向量的第二编码结果。
在步骤206中,可以基于第一编码结果和第二编码结果,确定用户问句文本与候选问句文本之间的相似度。如前所述,相似度可以用于确定针对用户问句文本的答复。
RNN可以是以序列数据为输入,在序列的演进方向进行递归且所有节点按链式连接的神经网络。CNN可以是包含卷积计算且具有深度结构的前馈神经网络,是深度学习的代表算法之一。CNN通常具有表征学习能力,能够按其阶层结构对输入信息进行平移不变分类。因此,RNN通常有利于捕捉文本的序列信息,而CNN通常擅长捕捉文本的关键词、词组信息。
因此,在该实施例中,通过RNN和CNN两者来对用户问句文本和候选问句文本进行编码,将RNN和CNN两者的优点相结合,能够更加充分地捕捉这两个文本的细粒度特征信息,由此能够更加准确地确定两个文本之间的相似度,从而使得智能问答系统能够更加准确地针对用户的问题作出应答,由此极大地提升用户体验。
在本说明书的实施例中,RNN可以通过各种适用的结构来实现,比如,RNN可以包括双向长短期记忆网络(Bi-directional Long Short-Term Memory,BiLSTM)、双向门控递归单元(Bi-directional Gated Recurrent Unit,BiGRU)等。在一个优选的实施例中, 在步骤204中可以利用BiGRU进行编码。在实践中发现,BiGRU的编码效果更好,同时效率更高。
在一个实施例中,在步骤204中,可以利用RNN进行初始编码,然后利用CNN在RNN编码的基础上进行二次编码。这样,通过RNN能够捕捉到文本的序列特征信息,而通过CNN能够捕捉到文本的词粒度特征信息,从而实现对文本的更深层次的编码,由此有效地提升相似度确定的准确性。
具体地,可以利用RNN对第一文本向量进行编码,得到第一初始编码向量。然后,利用CNN对第一初始编码向量进行编码,得到第一二次编码向量。这样,第一编码结果可以包括第一初始编码向量和第一二次编码向量。
类似地,可以利用RNN对第二文本向量进行编码,得到第二初始编码向量。然后,利用CNN对第二初始编码向量进行编码,得到第二二次编码向量。这样,第二编码结果可以包括第二初始编码向量和第二二次编码结果。
在本说明书的实施例中,CNN可以通过各种适用的结构来实现。优选地,CNN可以基于“Network in Network(网络中的网络)”思想来实现。例如,CNN可以包括至少两级卷积层,在这至少两级卷积层中,第一级卷积层可以包括卷积核大小为1的至少一个CNN单元,而在第一级卷积层之后的各级卷积层可以包括卷积核大小大于1的至少一个CNN单元。具体地,在第一级卷积层中,各个CNN单元的卷积核大小均为1,而在第一级卷积层之后的卷积层中,各个CNN单元的卷积核大小可以大于1。
这种CNN结构能够更加高效且方便地提取文本的词粒度特征信息。而且,由于CNN卷积核特有的参数共享机制,使得参数量也能缩小,从而降低计算复杂度。
在对第一文本向量和第二文本向量进行编码之后,可以基于编码结果来确定两个文本之间的相似度。
在一个实施例中,在步骤206中,可以基于软注意力机制来充分捕捉两个文本之间的交互信息,比如两个文本之间相似或不相似的信息。具体地,可以基于软注意力机制对第一编码结果和第二编码进行处理,得到针对用户问句文本的第一交互表示结果和针对候选问句文本的第二交互表示结果。
在一个实施例中,可以基于第一初始编码向量和第二初始编码向量,确定用户问句文本相对于候选问句文本的第一相对注意力向量以及候选问句文本相对于用户问句文本的第二相对注意力向量。
例如,可以基于第一初始编码向量和第二初始编码向量,计算用户问句文本和候选问句文本之间的软注意力权重矩阵。然后,可以利用软注意力权重矩阵和第二初始编 码向量,计算第一相对注意力向量。类似地,可以利用软注意力权重矩阵和第一初始编码向量,计算第二相对注意力向量。
然后,可以通过拼接和池化操作,对第一相对注意力向量、第一初始编码向量和第一二次编码向量进行处理,得到第一交互表示结果。
例如,可以对第一相对注意力向量和第一初始编码向量进行一系列拼接操作,得到第一拼接结果。然后,可以对第一拼接结果分别进行均值池化(Avg Pooling)和最大池化(Max Pooling),得到第一池化结果。之后,可以将第一池化结果和第一二次编码向量进行进一步拼接,得到第一交互表示结果。
类似地,可以通过拼接和池化操作,对第二相对注意力向量、第二初始编码向量和第二二次编码向量进行处理,得到第二交互表示结果。
例如,可以对第二相对注意力向量和第二初始编码向量进行一系列拼接操作,得到第二拼接结果。然后,可以对第二拼接结果分别进行均值池化和最大池化,得到第二池化结果。之后,可以将第二池化结果和第二二次编码向量进行进一步拼接,得到第二交互表示结果。
之后,可以基于第一交互表示结果和第二交互表示结果,确定相似度。
例如,可以基于门限机制对第一交互表示结果和第二交互表示结果进行融合,得到融合结果。门限机制可以用于筛选出第一交互表示结果和第二交互表示结果两者中能够融合的部分。可以理解的是,门限机制也是用于过滤掉第一交互表示结果和第二交互表示结果两者中不能融合的部分。这样,能够更好地将两个交互表示结果中更加重要的信息进行融合。
然后,可以基于融合结果,确定用户问句文本与候选问句文本相似的概率,作为两者之间的相似度。
可以理解的是,上述图2的过程可以通过模型来实现,本文中将其称为相似度模型。下面将结合具体示例进一步详细描述。应当理解的是,下面的示例仅是为了帮助本领域技术人员更好地理解上述内容,而并不对上述技术方案的范围造成任何限制。
图3A是根据一个实施例的相似度模型的结构示意图。图3A的相似度模型300A可以用于实现上述图2描述的过程。
如图3A所示,相似度模型300A可以包括编码层、交互表示层、融合层和全连接层。
(1)编码层。编码层可以包括BiGRU编码器和CNN编码器。
如前所述,BiGRU编码器可以对输入到相似度模型300A的第一文本向量和第二 文本向量分别进行初始编码。
第一文本向量可以用于表示用户问句文本。例如,第一文本向量可以是通过对用户问句文本进行向量化得到的,其可以是序列向量。
第二文本向量可以用于表示候选问句文本。例如,候选问句文本可以是从知识库召回的多个问句文本之一。第二文本向量可以是通过对候选问句文本进行向量化得到的,其可以是序列向量。
<1>首先,利用BiGRU编码器对第一文本向量和第二文本向量分别进行编码,以捕捉用户问句文本和候选问句文本的序列特征信息。BiGRU编码器的计算过程可以利用如下等式表示:
Figure PCTCN2020132636-appb-000001
Figure PCTCN2020132636-appb-000002
其中,
Figure PCTCN2020132636-appb-000003
可以表示第一初始编码向量,l a可以表示用户问句文本的长度。通常,用户问句文本的长度可以通过用户问句文本中的字数来表示。
类似地,
Figure PCTCN2020132636-appb-000004
可以表示第二初始编码向量,l b可以表示候选问句文本的长度。通常,候选问句文本的长度可以通过候选问句文本中的字数来表示。
<2>在利用BiGRU编码器进行编码之后,可以利用CNN编码器进行二次编码。例如,CNN编码器的计算过程可以利用如下等式来表示:
Figure PCTCN2020132636-appb-000005
此处,
Figure PCTCN2020132636-appb-000006
可以表示BiGRU输出的初始编码向量,
Figure PCTCN2020132636-appb-000007
可以表示CNN编码器输出的二次编码向量。
优选地,CNN编码器可以采用“Network in Network”的结构来实现,以便更好地捕捉词粒度特征信息。例如,图3B是根据一个实施例的CNN编码器的结构示意图。
在图3B的示例中,CNN编码器300B可以包括两级卷积层。第一级卷积层可以包括三个K=1的CNN单元,第二级卷积层可以包括K=2的CNN单元以及K=3的CNN单元,其中,K可以表示卷积核大小。
第一级卷积层可以对来自BiGRU编码器的初始编码向量(例如前述第一初始编码向量和第二初始编码向量)进行处理。第二级卷积层可以对第一级卷积层中的相应CNN单元的输出结果进行处理。各个CNN单元可以采用各种适用的函数来实现,例如ReLU激活函数。
此外,CNN编码器300B还可以包括池化层和连接层。池化层可以对来自各级卷积层的结果进行均值池化和最大池化操作。连接层可以将池化结果进行拼接,从而得到 二次编码向量。
应当理解的是,虽然在图3B中示出了两级卷积层以及各级卷积层所包括的CNN单元,但是在实际应用时,CNN编码器可以包括其它数量的卷积层以及其它数量的CNN单元,并且CNN单元可以具有其它不同的卷积核大小,本说明书对此不作限定。
(2)交互表示层。在交互表示层,可以基于软注意力机制来对编码层得到的初始编码向量和二次编码向量进行处理,从而充分捕捉两个文本之间的交互信息,比如相似和不相似的信息。
例如,可以利用第一初始编码向量和第二初始编码向量来计算软注意力权重矩阵e ij,其可以使用如下等式来表示:
Figure PCTCN2020132636-appb-000008
然后,可以利用软注意力权重矩阵和第二二次编码向量,来计算用户问句文本相对于候选问句文本的第一相对注意力向量
Figure PCTCN2020132636-appb-000009
类似地,可以利用软注意力权重矩阵和第一二次编码向量,来计算候选问句文本相对于用户问句文本的第二相对注意力向量
Figure PCTCN2020132636-appb-000010
该过程可以使用如下等式来表示:
Figure PCTCN2020132636-appb-000011
Figure PCTCN2020132636-appb-000012
然后,可以将第一相对注意力向量和第一初始编码向量进行拼接操作,得到第一拼接结果v a。类似地,可以将第二相对注意力向量和第二初始编码向量进行拼接操作,得到第二拼接结果v b。该过程可以使用如下等式来表示:
Figure PCTCN2020132636-appb-000013
Figure PCTCN2020132636-appb-000014
之后,可以对第一拼接结果和第二拼接结果分别进行均值池化和最大池化操作。该过程可以使用如下等式来表示:
Figure PCTCN2020132636-appb-000015
Figure PCTCN2020132636-appb-000016
然后,可以将池化结果和二次编码向量进行进一步拼接,得到第一交互表示结果O a以及第二交互表示结果O b。该过程可以使用如下等式来表示:
Figure PCTCN2020132636-appb-000017
Figure PCTCN2020132636-appb-000018
(3)融合层。在融合层,可以基于门限机制,将第一交互表示结果和第二交互表示结果进行融合。门限机制可以自动学习第一交互表示结果和第二交互表示结果中能够融合(或者不能融合)的部分,从而有效地融合这两个结果中更加重要的信息。
例如,融合过程可以使用如下的等式来表示:
m(P,Q)=tanh(W f[P;Q;PοQ;P-Q]+b f)
o′ a=g(o a,o b)·m(o a,o b)+(1-g(o a,o b))·o a
o′ b=g(o b,o a)·m(o b,o a)+(1-g(o b,o a))·o b
m out=[o′ a,o′ b]
其中,在计算O a′时,P可以对应于O a;Q可以对应于O b;W f和b f可以表示可训练的向量;m可以表示融合层的融合函数;g可以表示实现门限机制的门限函数;同理,计算O b′的过程类似,只是融合函数和门限函数的两个输入交换位置。m out可以表示O′ a和O′ b的融合结果。
门限函数可以采用各种适用的算法来实现。在一个实施例中,门限函数可以使用以下等式来表示:
g(P,Q)=σ(w g·[P;Q;PοQ;P-Q]+b g)。
这里需要注意的是,对于融合函数m(P,Q)和门限函数g(P,Q),m(P,Q)≠m(Q,P),g(P,Q)≠g(Q,P)。其中,w g可以表示可训练的向量,b g可以表示可训练的偏置向量,σ可以表示sigmoid函数。
(4)全连接层。在全连接层,可以基于融合结果,计算用户问句文本和候选问句文本之间相似的概率,作为这两者之间的相似度。
在该实施例中,在编码层,利用BiGRU编码器和CNN编码器两者来进行编码,能够有效地捕捉两个文本各自的序列信息和词粒度信息;在交互表示层,基于软注意力机制,能够充分地捕捉两个文本的交互信息;在融合层,基于门限机制,能够有利地融合交互表示结果中更加重要的部分;因此,通过该实施例的相似度模型,能够准确地得到文本之间的相似度,从而提升针对用户问题的答复精准度。此外,该相似度模型属于轻量级的模型,在实现上更加高效。
图4是根据一个实施例的用于文本处理的装置的示意性框图。
如图4所示,装置400可以包括接收模块402、编码模块404和确定模块406。
接收模块402可以接收第一文本向量和第二文本向量。第一文本向量可以用于表示用户问句文本,第二文本向量可以用于表示候选问句文本。候选问句文本可以是从知识库中获取的,知识库可以包括至少一个现有问句文本以及针对各个现有问句文本的答案文本。
编码模块404可以利用RNN和CNN,对第一文本向量和第二文本向量进行编码,得到针对第一文本向量的第一编码结果和针对第二文本向量的第二编码结果。
确定模块406可以基于第一编码结果和第二编码结果,确定用户问句文本与候选问句文本之间的相似度,其中,相似度用于确定针对用户问句文本的答复。
在该实施例中,通过RNN和CNN两者来对用户问句文本和候选问句文本进行编码,将RNN和CNN两者的优点相结合,能够更加充分地捕捉这两个文本的细粒度特征信息,由此能够更加准确地确定两个文本之间的相似度,从而使得智能问答系统能够更加准确地针对用户的问题作出应答,由此极大地提升用户体验。
在一个实施例中,第一编码结果可以包括第一初始编码向量和第一二次编码向量。第二编码结果可以包括第二初始编码向量和第二二次编码向量。
编码模块402可以利用RNN对第一文本向量进行编码,得到第一初始编码向量,以及利用CNN对第一初始编码向量进行编码,得到第一二次编码向量。
编码模块402可以利用RNN对第二文本向量进行编码,得到第二初始编码向量,以及利用CNN对第二初始编码向量进行编码,得到第二二次编码向量。
在一个实施例中,确定模块406可以基于软注意力机制对第一编码结果和第二编码结果进行处理,得到针对用户问句文本的第一交互表示结果和针对候选问句文本的第二交互表示结果。
确定模块406可以基于第一交互表示结果和第二交互表示结果,确定相似度。
在一个实施例中,确定模块406可以基于第一初始编码向量和第二初始编码向量,确定用户问句文本相对于候选问句文本的第一相对注意力向量、以及候选问句文本相对于用户问句文本的第二相对注意力向量。
确定模块406可以通过拼接操作和池化操作对第一相对注意力向量、第一初始编码向量和第一二次编码向量进行处理,得到第一交互表示结果。
确定模块406可以通过拼接操作和池化操作对第二相对注意力向量、第二初始编码向量和第二二次编码向量进行处理,得到第二交互表示结果。
在一个实施例中,确定模块406可以基于门限机制对第一交互表示结果和第二交互表示结果进行融合,得到融合结果,其中,门限机制用于筛选出第一交互表示结果和第二交互表示结果两者中能够融合的部分。
确定模块406可以基于融合结果,确定用户问句文本与候选问句文本相似的概率,作为相似度。
在一个实施例中,CNN可以包括至少两级卷积层,在至少两级卷积层中,第一级卷积层包括卷积核大小为1的至少一个CNN单元,在第一级卷积层之后的各级卷积层包括卷积核大小大于1的至少一个CNN单元。
在一个实施例中,RNN可以包括BiGRU。
装置400的各个模块可以执行图1至3B的方法实施例中的相应步骤,因此,为了描述的简洁,装置400的各个单元的具体操作和功能此处不再赘述。
上述装置400可以采用硬件实现,也可以采用软件实现,或者可以通过软硬件的组合来实现。例如,装置400在采用软件实现时,其可以通过其所在设备的处理器将存储器(比如非易失性存储器)中对应的可执行代码读取到内存中运行来形成。
图5是根据一个实施例的用于文本处理的计算设备的硬件结构图。如图5所示,计算设备500可以包括至少一个处理器502、存储器504、内存506和通信接口508,并且至少一个处理器502、存储器504、内存506和通信接口508经由总线510连接在一起。至少一个处理器502执行在存储器504中存储或编码的至少一个可执行代码(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器504中存储的可执行代码在被至少一个处理器502执行时,使得计算设备实现以上结合图1-3B描述的各种过程。
计算设备500可以采用本领域任何适用的形式来实现,例如,其包括但不限于台式计算机、膝上型计算机、智能电话、平板计算机、消费电子设备、可穿戴智能设备等等。
本说明书的实施例还提供了一种机器可读存储介质。该机器可读存储介质可以存储有可执行代码,可执行代码在被机器执行时使得机器实现上面参照图1-3B描述的方法实施例的具体过程。
例如,机器可读存储介质可以包括但不限于随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、电可擦除可编程只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、静态随机存取存储器(Static Random Access Memory,SRAM)、硬盘、闪存等等。
应当理解的是,本说明书中的各个实施例均采用递进的方式来描述,各个实施例之间相同或相似的部分相互参见即可,每个实施例重点说明的都是与其它实施例的不同之处。例如,对于上述关于装置的实施例、关于计算设备的实施例以及关于机器可读存储介质的实施例而言,由于它们基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上文对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
上述各流程和各系统结构图中不是所有的步骤和单元都是必须的,可以根据实际的需要忽略某些步骤或单元。上述各实施例中描述的装置结构可以是物理结构,也可以是逻辑结构,即,有些单元可能由同一物理实体实现,或者,有些单元可能分别由多个物理实体实现,或者,可以由多个独立设备中的某些部件共同实现。
在整个本说明书中使用的术语“示例性”意味着“用作示例、实例或例示”,并不意味着比其它实施例“优选”或“具有优势”。出于提供对所描述技术的理解的目的,具体实施方式包括具体细节。然而,可以在没有这些具体细节的情况下实施这些技术。在一些实例中,为了避免对所描述的实施例的概念造成难以理解,公知的结构和装置以框图形式示出。
以上结合附图详细描述了本公开内容的实施例的可选实施方式,但是,本公开内容的实施例并不限于上述实施方式中的具体细节,在本公开内容的实施例的技术构思范围内,可以对本公开内容的实施例的技术方案进行多种变型,这些变型均属于本公开内容的实施例的保护范围。
本公开内容的上述描述被提供来使得本领域任何普通技术人员能够实现或者使用本公开内容。对于本领域普通技术人员来说,对本公开内容进行的各种修改是显而易见的,并且,也可以在不脱离本公开内容的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。因此,本公开内容并不限于本文所描述的示例和设计,而是与符合本文公开的原理和新颖性特征的最广范围相一致。

Claims (16)

  1. 一种用于文本处理的方法,包括:
    接收第一文本向量和第二文本向量,其中,所述第一文本向量用于表示用户问句文本,所述第二文本向量用于表示候选问句文本,所述候选问句文本是从知识库中获取的,所述知识库包括至少一个现有问句文本以及针对各个现有问句文本的答案文本;
    利用递归神经网络RNN和卷积神经网络CNN,对所述第一文本向量和所述第二文本向量进行编码,得到针对所述第一文本向量的第一编码结果和针对所述第二文本向量的第二编码结果;
    基于所述第一编码结果和所述第二编码结果,确定所述用户问句文本与所述候选问句文本之间的相似度,其中,所述相似度用于确定针对所述用户问句文本的答复。
  2. 根据权利要求1所述的方法,其中,所述第一编码结果包括第一初始编码向量和第一二次编码向量,所述第二编码结果包括第二初始编码向量和第二二次编码向量;
    对所述第一文本向量和所述第二文本向量进行编码,包括:
    利用所述RNN对所述第一文本向量进行编码,得到所述第一初始编码向量,以及利用所述CNN对所述第一初始编码向量进行编码,得到所述第一二次编码向量;
    利用所述RNN对所述第二文本向量进行编码,得到所述第二初始编码向量,以及利用所述CNN对所述第二初始编码向量进行编码,得到所述第二二次编码向量。
  3. 根据权利要求2所述的方法,其中,确定所述用户问句文本与所述候选问句文本之间的相似度,包括:
    基于软注意力机制对所述第一编码结果和所述第二编码结果进行处理,得到针对所述用户问句文本的第一交互表示结果和针对所述候选问句文本的第二交互表示结果;
    基于所述第一交互表示结果和所述第二交互表示结果,确定所述相似度。
  4. 根据权利要求3所述的方法,其中,基于软注意力机制对所述第一编码结果和所述第二编码结果进行处理,包括:
    基于所述第一初始编码向量和所述第二初始编码向量,确定所述用户问句文本相对于所述候选问句文本的第一相对注意力向量、以及所述候选问句文本相对于所述用户问句文本的第二相对注意力向量;
    通过拼接操作和池化操作对所述第一相对注意力向量、所述第一初始编码向量和所述第一二次编码向量进行处理,得到所述第一交互表示结果;
    通过拼接操作和池化操作对所述第二相对注意力向量、所述第二初始编码向量和所述第二二次编码向量进行处理,得到所述第二交互表示结果。
  5. 根据权利要求3或4所述的方法,其中,基于所述第一交互表示结果和所述第二交互表示结果,确定所述相似度,包括:
    基于门限机制对所述第一交互表示结果和所述第二交互表示结果进行融合,得到融合结果,其中,所述门限机制用于筛选出所述第一交互表示结果和所述第二交互表示结果两者中能够融合的部分;
    基于所述融合结果,确定所述用户问句文本与所述候选问句文本相似的概率,作为所述相似度。
  6. 根据权利要求1至5中任一项所述的方法,其中,所述CNN包括至少两级卷积层,在所述至少两级卷积层中,第一级卷积层包括卷积核大小为1的至少一个CNN单元,在所述第一级卷积层之后的各级卷积层包括卷积核大小大于1的至少一个CNN单元。
  7. 根据权利要求1至6中任一项所述的方法,其中,所述RNN包括双向门控递归单元BiGRU。
  8. 一种用于文本处理的装置,包括:
    接收模块,其接收第一文本向量和第二文本向量,其中,所述第一文本向量用于表示用户问句文本,所述第二文本向量用于表示候选问句文本,所述候选问句文本是从知识库中获取的,所述知识库包括至少一个现有问句文本以及针对各个现有问句文本的答案文本;
    编码模块,其利用递归神经网络RNN和卷积神经网络CNN,对所述第一文本向量和所述第二文本向量进行编码,得到针对所述第一文本向量的第一编码结果和针对所述第二文本向量的第二编码结果;
    确定模块,其基于所述第一编码结果和所述第二编码结果,确定所述用户问句文本与所述候选问句文本之间的相似度,其中,所述相似度用于确定针对所述用户问句文本的答复。
  9. 根据权利要求8所述的装置,其中,所述第一编码结果包括第一初始编码向量和第一二次编码向量,所述第二编码结果包括第二初始编码向量和第二二次编码向量;
    所述编码模块进行以下操作:
    利用所述RNN对所述第一文本向量进行编码,得到所述第一初始编码向量,以及利用所述CNN对所述第一初始编码向量进行编码,得到所述第一二次编码向量;
    利用所述RNN对所述第二文本向量进行编码,得到所述第二初始编码向量,以及利用所述CNN对所述第二初始编码向量进行编码,得到所述第二二次编码向量。
  10. 根据权利要求9所述的装置,其中,所述确定模块进行以下操作:
    基于软注意力机制对所述第一编码结果和所述第二编码结果进行处理,得到针对所述用户问句文本的第一交互表示结果和针对所述候选问句文本的第二交互表示结果;
    基于所述第一交互表示结果和所述第二交互表示结果,确定所述相似度。
  11. 根据权利要求10所述的装置,其中,所述确定模块进行以下操作:
    基于所述第一初始编码向量和所述第二初始编码向量,确定所述用户问句文本相对于所述候选问句文本的第一相对注意力向量、以及所述候选问句文本相对于所述用户问句文本的第二相对注意力向量;
    通过拼接操作和池化操作对所述第一相对注意力向量、所述第一初始编码向量和所述第一二次编码向量进行处理,得到所述第一交互表示结果;
    通过拼接操作和池化操作对所述第二相对注意力向量、所述第二初始编码向量和所述第二二次编码向量进行处理,得到所述第二交互表示结果。
  12. 根据权利要求10或11所述的装置,其中,所述确定模块进行以下操作:
    基于门限机制对所述第一交互表示结果和所述第二交互表示结果进行融合,得到融合结果,其中,所述门限机制用于筛选出所述第一交互表示结果和所述第二交互表示结果两者中能够融合的部分;
    基于所述融合结果,确定所述用户问句文本与所述候选问句文本相似的概率,作为所述相似度。
  13. 根据权利要求8至12中任一项所述的装置,其中,所述CNN包括至少两级卷积层,在所述至少两级卷积层中,第一级卷积层包括卷积核大小为1的至少一个CNN单元,在所述第一级卷积层之后的各级卷积层包括卷积核大小大于1的至少一个CNN单元。
  14. 根据权利要求8至13中任一项所述的装置,其中,所述RNN包括双向门控递归单元BiGRU。
  15. 一种计算设备,包括:
    至少一个处理器;
    与所述至少一个处理器进行通信的存储器,其上存储有可执行代码,所述可执行代码在被所述至少一个处理器执行时使得所述至少一个处理器实现根据权利要求1至7中任一项所述的方法。
  16. 一种机器可读存储介质,其存储有可执行代码,所述可执行代码在被执行时使得机器执行根据权利要求1至7中任一项所述的方法。
PCT/CN2020/132636 2020-02-28 2020-11-30 用于文本处理的方法和装置 WO2021169453A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010127184.3 2020-02-28
CN202010127184.3A CN111339256A (zh) 2020-02-28 2020-02-28 用于文本处理的方法和装置

Publications (1)

Publication Number Publication Date
WO2021169453A1 true WO2021169453A1 (zh) 2021-09-02

Family

ID=71182029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/132636 WO2021169453A1 (zh) 2020-02-28 2020-11-30 用于文本处理的方法和装置

Country Status (2)

Country Link
CN (1) CN111339256A (zh)
WO (1) WO2021169453A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115169489A (zh) * 2022-07-25 2022-10-11 北京百度网讯科技有限公司 数据检索方法、装置、设备以及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339256A (zh) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 用于文本处理的方法和装置
CN111859939B (zh) * 2020-07-29 2023-07-25 中国平安人寿保险股份有限公司 文本匹配方法、系统和计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967257A (zh) * 2017-11-20 2018-04-27 哈尔滨工业大学 一种级联式作文生成方法
CN109766418A (zh) * 2018-12-13 2019-05-17 北京百度网讯科技有限公司 用于输出信息的方法和装置
US20190171720A1 (en) * 2015-08-25 2019-06-06 Alibaba Group Holding Limited Method and system for generation of candidate translations
CN111339256A (zh) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 用于文本处理的方法和装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10606846B2 (en) * 2015-10-16 2020-03-31 Baidu Usa Llc Systems and methods for human inspired simple question answering (HISQA)
CN105574133A (zh) * 2015-12-15 2016-05-11 苏州贝多环保技术有限公司 一种多模态的智能问答系统及方法
CN107562812B (zh) * 2017-08-11 2021-01-15 北京大学 一种基于特定模态语义空间建模的跨模态相似性学习方法
CN108153737A (zh) * 2017-12-30 2018-06-12 北京中关村科金技术有限公司 一种语义分类的方法、系统及对话处理系统
KR101993001B1 (ko) * 2019-01-16 2019-06-25 영남대학교 산학협력단 영상 하이라이트 제작 장치 및 방법

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190171720A1 (en) * 2015-08-25 2019-06-06 Alibaba Group Holding Limited Method and system for generation of candidate translations
CN107967257A (zh) * 2017-11-20 2018-04-27 哈尔滨工业大学 一种级联式作文生成方法
CN109766418A (zh) * 2018-12-13 2019-05-17 北京百度网讯科技有限公司 用于输出信息的方法和装置
CN111339256A (zh) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 用于文本处理的方法和装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115169489A (zh) * 2022-07-25 2022-10-11 北京百度网讯科技有限公司 数据检索方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN111339256A (zh) 2020-06-26

Similar Documents

Publication Publication Date Title
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
WO2021169453A1 (zh) 用于文本处理的方法和装置
US9807473B2 (en) Jointly modeling embedding and translation to bridge video and language
CN111368993B (zh) 一种数据处理方法及相关设备
Bu et al. Learning high-level feature by deep belief networks for 3-D model retrieval and recognition
US11526698B2 (en) Unified referring video object segmentation network
US11010664B2 (en) Augmenting neural networks with hierarchical external memory
CN111159485B (zh) 尾实体链接方法、装置、服务器及存储介质
US20230023271A1 (en) Method and apparatus for detecting face, computer device and computer-readable storage medium
CN107491782A (zh) 利用语义空间信息的针对少量训练数据的图像分类方法
Yang et al. Deep attention-guided hashing
CN114329029A (zh) 对象检索方法、装置、设备及计算机存储介质
CN113761868A (zh) 文本处理方法、装置、电子设备及可读存储介质
CN114357151A (zh) 文本类目识别模型的处理方法、装置、设备及存储介质
CN115730597A (zh) 多级语义意图识别方法及其相关设备
CN115410199A (zh) 图像内容检索方法、装置、设备及存储介质
Amara et al. Cross-network representation learning for anchor users on multiplex heterogeneous social network
Sun et al. Graph force learning
CN111581392A (zh) 一种基于语句通顺度的自动作文评分计算方法
CN110347853B (zh) 一种基于循环神经网络的图像哈希码生成方法
CN111988668A (zh) 一种视频推荐方法、装置、计算机设备及存储介质
WO2023173552A1 (zh) 目标检测模型的建立方法、应用方法、设备、装置及介质
CN109033084B (zh) 一种语义层次树构建方法以及装置
US20240037335A1 (en) Methods, systems, and media for bi-modal generation of natural languages and neural architectures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20922299

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20922299

Country of ref document: EP

Kind code of ref document: A1