CN111274808B

CN111274808B - Text retrieval method, model training method, text retrieval device, and storage medium

Info

Publication number: CN111274808B
Application number: CN202010086368.XA
Authority: CN
Inventors: 陈晓军; 崔恒斌
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-02-11
Filing date: 2020-02-11
Publication date: 2023-07-04
Anticipated expiration: 2040-02-11
Also published as: CN111274808A

Abstract

The specification relates to a text retrieval method comprising: word segmentation is carried out on the received first text to obtain at least one word; recall at least one second text from the knowledge base based on the at least one term; inputting the at least one word into a trained text vector model to obtain a text vector of the first text; recalling at least one third text from the knowledge base according to the vector of the first text; and fusing the at least one second text and the at least one third text to obtain a text retrieval result. The specification also provides training methods of the word weight model and the text vector model, a text retrieval device, an electronic device and a computer readable storage medium.

Description

Text retrieval method, model training method, text retrieval device, and storage medium

Technical Field

The present disclosure relates to the field of natural language processing, and in particular, to a text retrieval method, a model training method, a text retrieval device, an electronic apparatus, and a computer readable storage medium.

Background

Text Retrieval (Text Retrieval), also known as natural language Retrieval, refers to the process of retrieving, classifying, filtering, etc., a collection of Text based on the content of the Text, such as terms, semantics, etc., contained in the Text. Text retrieval and image retrieval, voice retrieval, picture retrieval, etc. are all part of the information retrieval. Generally, the result of text retrieval can be measured by two basic indexes, namely accuracy and recall. Where accuracy generally refers to the ratio of the retrieved relevant documents divided by all retrieved documents; recall, also referred to as recall, generally refers to the ratio of the relevant documents retrieved to the total number of relevant documents. Therefore, how to improve the accuracy or recall of text retrieval is a key problem that text retrieval needs to solve.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a text retrieval method, which may include: word segmentation is carried out on the received first text to obtain at least one word; recall at least one second text from the knowledge base based on the at least one term; inputting the at least one word into a trained text vector model to obtain a text vector of the first text; recalling at least one third text from the knowledge base according to the vector of the first text; and fusing the at least one second text and the at least one third text to obtain a text retrieval result.

In an embodiment of the present disclosure, recalling at least one second text from the knowledge base according to the at least one word may include: determining word weights of the at least one word respectively; determining at least one keyword from the at least one word according to the word weight of the at least one word; and recalling at least one second text from the knowledge base in accordance with the at least one keyword.

In an embodiment of the present specification, the determining the word weights of the at least one word respectively may include: and respectively inputting the at least one word into a trained word weight model to obtain the word weight of the at least one word.

In an embodiment of the present specification, the word weight model may include: an encoder and a linear transform layer; the encoder encodes the at least one word respectively to obtain a word vector of the at least one word; the linear transformation layer respectively carries out linear transformation on the word vector of the at least one word to obtain the word weight of the at least one word.

In an embodiment of the present specification, the determining the word weight of the at least one word may include: and determining the word weight of the at least one word according to a word frequency-inverse text frequency index TF-IDF algorithm.

In an embodiment of the present specification, fusing the at least one second text and the at least one third text may include: and merging the at least one second text and the at least one third text to obtain the text retrieval result.

In an embodiment of the present specification, fusing the at least one second text and the at least one third text may include: inputting the at least one second text and the at least one third text into a trained text vector model, respectively, and determining a text vector of the at least one second text and a text vector of the at least one third text; averaging the text vectors of the at least one second text to obtain an average vector of the second text, and performing linear transformation on the average vector of the second text to obtain an average weight of the second text; averaging the text vectors of the at least one third text to obtain an average vector of the third text, and performing linear transformation on the average vector of the third text to obtain an average weight of the third text; determining that the at least one second text is the text retrieval result in response to the average weight of the second text being greater than or equal to the average weight of the third text; and determining that the at least one third text is the text retrieval result in response to the average weight value of the second text being less than the average weight of the third text.

The embodiment of the specification provides a method for training a word weight model, which may include:

acquiring training data, wherein the training data comprises a plurality of training texts and known output corresponding to each training text; wherein each training text comprises at least one second word; the known output is a word weight of the at least one second word;

inputting at least one second word obtained by word segmentation of the training text into the encoder for each training text, and generating a word vector of the at least one second word according to the current value of the parameter of the encoder; inputting the word vector of the at least one second word into a linear transformation layer, generating word weights of the at least one second word according to the current value of the parameter of the linear transformation layer, and taking the word weights of the at least one second word as the prediction output of the training text; determining a gradient based on an error between a predicted output and a known output of the training text; the gradient is counter-propagated to the encoder and the linear transformation layer to jointly adjust current values of parameters of the encoder and the linear transformation layer.

Embodiments of the present disclosure provide a method for training a text vector model, which may include:

acquiring second training data, wherein the second training data comprises a plurality of groups of training text pairs and known output corresponding to each training text pair; wherein each training text pair comprises a first training text and a second training text; the known output is a degree of matching of the first training text and the second training text;

respectively inputting a first training text and a second training text of each training text pair into a text vector model, and generating a first training text vector corresponding to the first training text and a second training text vector corresponding to the second training text according to the current value of the parameters of the text vector model; determining the matching degree of the first training text and the second training text according to the first training text vector and the second training text vector, and taking the matching degree as the prediction output of the training text pair; determining a gradient based on an error between a predicted output and a known output of the training text pair; the gradient is back-propagated to the text vector model to adjust the current values of parameters of the text vector model.

for each training text pair, respectively inputting a first training text and a second training text in the training text pair into a BERT model, and generating a first training text vector corresponding to the first training text and a first training text vector corresponding to the second training text according to the current value of the parameter of the BERT model; determining the matching degree of the first training text and the second training text according to the first training text vector and the second training text vector, and taking the matching degree as the prediction output of the training text pair; determining a gradient based on an error between a predicted output and a known output of the training text pair; back-propagating the gradient to the BERT model to adjust current values of parameters of the BERT model;

After the BERT model training is completed, the text vector model is trained using model distillation according to the trained BERT model.

In an embodiment of the present specification, the text vector model may include a CNN model or an LSTM model.

Embodiments of the present specification provide a text retrieval apparatus that may include:

the word segmentation module is used for segmenting the received first text to obtain at least one word;

a word recall module for recalling at least one second text from the knowledge base in accordance with the at least one word;

the text vector generation module is used for inputting the at least one word into a trained text vector model to obtain a text vector of the first text;

a vector recall module for recalling at least one third text from the knowledge base according to the vector of the first text; and

and the fusion module is used for fusing the at least one second text and the at least one third text to obtain a text retrieval result.

In an embodiment of the present disclosure, the term recall module includes:

the word weight model is used for respectively inputting the at least one word into the trained word weight model to obtain the word weight of the at least one word;

A keyword determining unit configured to determine at least one keyword from the at least one word according to a word weight of the at least one word; and

and the word recall unit is used for recalling at least one second text from the knowledge base according to the at least one keyword.

In an embodiment of the present specification, the word weight model includes:

an encoder and a linear transform layer; wherein,,

the encoder encodes the at least one word respectively to obtain a word vector of the at least one word;

the linear transformation layer respectively carries out linear transformation on the word vector of the at least one word to obtain the word weight of the at least one word.

In an embodiment of the present disclosure, the term recall module includes:

a word weight determining unit, configured to determine a word weight of the at least one word according to a word frequency-inverse text frequency index TF-IDF algorithm;

In an embodiment of the present disclosure, the fusion module includes:

and the union unit is used for merging the at least one second text and the at least one third text to obtain the text retrieval result.

In an embodiment of the present disclosure, the fusion module includes:

a trained text vector model for encoding the at least one second text and the at least one third text, respectively, determining a text vector for the at least one second text and a text vector for the at least one third text;

the text average weight determining module is used for averaging the text vectors of the at least one second text to obtain an average vector of the second text, and carrying out linear transformation on the average vector of the second text to obtain an average weight of the second text; averaging the text vectors of the at least one third text to obtain an average vector of the third text, and performing linear transformation on the average vector of the third text to obtain an average weight of the third text;

a file retrieval result determining unit configured to determine the at least one second text as the text retrieval result in response to an average weight of the second text being greater than or equal to an average weight of the third text; and determining that the at least one third text is the text retrieval result in response to the average weight value of the second text being less than the average weight of the third text.

Embodiments of the present disclosure also provide an electronic device, which may include: memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the above method when executing the program.

Embodiments of the present specification also provide a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform implementing the above-described method.

Therefore, according to the text retrieval method and device, after the text to be retrieved, which is input by a user, is segmented, on one hand, word retrieval is carried out according to the segmented words, on the other hand, the vector of the text to be retrieved is determined, and vector retrieval is carried out, so that the text retrieval mode of word retrieval is combined with the text retrieval mode of vector retrieval, on the basis that the text related to the word level can be retrieved, the text related to the semantic level can be retrieved, namely, the text retrieval of the semantic level is added on the basis of the text retrieval of the word level, so that the text retrieval result is more comprehensive, and the retrieval rate of the text retrieval is improved.

In addition, in the embodiment of the specification, in the process of word recall, the word weight of each word can be further determined, and the keyword is extracted from the word after word segmentation according to the word weight of each word, namely, the unimportant word is removed, and finally, the text retrieval is performed in a knowledge base by using the keyword, so that the disturbance of the unimportant word on the text retrieval when the text is retrieved by using the word can be effectively removed, invalid text recall is reduced, the retrieval result is more accurate, and the accuracy of the text retrieval is improved. Moreover, because the word weight of each word is determined by using the supervised word weight model, the models are obtained by training based on a large number of pre-marked data sets, and therefore, the determined word weight is more accurate, and the accuracy of text retrieval is further improved.

Further, in the embodiment of the present specification, the recall results of the word recall and the vector recall may be fused by the fusion model, and it is ensured that the search result includes the text that matches the first text input by the user, that is, includes the "best answer" of the text search.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a text retrieval system 100 according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a text retrieval method according to some embodiments of the present disclosure;

FIG. 3 is a flow diagram illustrating recall of at least one second text from a knowledge base based on at least one term in accordance with some embodiments of the present disclosure;

FIG. 4 is a flow chart illustrating recall of at least one second text from a knowledge base based on at least one term according to further embodiments of the present disclosure;

FIG. 5 is a flow chart of a method of fusing at least one second text and at least one third text according to some embodiments of the present disclosure;

FIG. 6 is a flowchart of a training method of a word weight model according to an embodiment of the present disclosure;

FIG. 7 shows the internal structure of a word weight model according to an embodiment of the present disclosure;

FIG. 8 is a flowchart of a training method of a text vector model according to an embodiment of the present disclosure;

FIG. 9 shows the internal structure of a text vector model according to an embodiment of the present disclosure;

FIG. 10 is a flowchart of another training method for a text vector model according to an embodiment of the present disclosure;

FIG. 11 shows the internal structure of a text vector model according to an embodiment of the present disclosure;

FIG. 12 is a flowchart of a training method of a fusion model according to an embodiment of the present disclosure;

FIG. 13 shows the internal structure of the fusion model according to the embodiment of the present specification;

fig. 14 shows an internal structure of the text retrieval device according to the embodiment of the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the present specification will be further described in detail below with reference to the accompanying drawings.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present specification should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

Fig. 1 shows a structure of a text retrieval system 100 according to an embodiment of the present specification. As shown in fig. 1, the text retrieval system 100 may include: at least one client 102, a server (which may also be referred to as a text retrieval device) 104, and a knowledge base 106.

The client 102 is configured to provide a user interface for a user, receive a text to be retrieved input by the user, forward the text to be retrieved to the server 104, and feed back a retrieval result for the text to be retrieved received from the server 104 to the user.

The server 104 is configured to receive a text to be retrieved input by a user from the client 102, perform a series of processing on the text to be retrieved, recall a certain number of texts from the knowledge base 106 according to the processing result, determine a retrieval result for the text to be retrieved from the texts, and return the determined retrieval result to the client 102.

The knowledge base 106 is used for storing a preset large amount of text. In general, knowledge base 106 may be considered a database or set of stored text, where the stored text may be considered a scope for text retrieval. The scope of this search may be the same for all users, i.e. different users may correspond to the same knowledge base. Furthermore, in some embodiments of the present disclosure, since different users have different degrees of attention to different types of information, different knowledge bases may also be set for different users, that is, the text retrieved by different users from their respective corresponding knowledge bases may be different for the same text to be retrieved. In the embodiment of the present specification, this range of text retrieval by a certain user is referred to as a knowledge base.

Fig. 2 is a flow chart illustrating a text retrieval method according to some embodiments of the present disclosure. The method may be performed by the server 104 in fig. 1. As shown in fig. 2, the method may include:

in step 202, the received first text is segmented to obtain at least one word.

In an embodiment of the present disclosure, the first text may be a text to be retrieved that is input by the user through the client 102. The first text may be, for example, a question or a sentence, etc. The client 102, after receiving the first text, sends the first text to the server 104 to perform text search within the scope of the preset knowledge base.

In the embodiment of the present specification, the above-described first text may be segmented using various methods, for example, a dictionary-based segmentation method, a statistical-based segmentation method, a rule-based segmentation method, a word-labeling-based segmentation method, an understanding-based segmentation method, and the like. The text retrieval scheme described in the embodiments of the present specification is not limited to the specific word segmentation method used.

At step 204, at least one second text is recalled from the knowledge base in accordance with the at least one term.

In embodiments of the present disclosure, recall of the at least one second text from the knowledge base based on the at least one term in step 204 described above may be implemented in a variety of ways. Specifically, the word weight of the at least one word may be first determined; then, determining keywords of the first text according to the determined word weights; and finally, carrying out text recall by utilizing the determined keywords.

FIG. 3 illustrates a flow of a method for recalling at least one second text from a knowledge base based on at least one word, in accordance with some embodiments of the present specification. As shown in fig. 3, the method specifically may include:

in step 302, the at least one word is respectively input into a trained word weight model to obtain the word weight of the at least one word;

determining at least one keyword from the at least one word according to the word weight of the at least one word in step 304; and

at step 306, at least one second text is recalled from the knowledge base in accordance with the at least one keyword.

In an embodiment of the present specification, the word weight model may include: an encoder and a linear transform layer; the encoder encodes the at least one word respectively to obtain word vectors of the at least one word; the linear transformation layer respectively carries out linear transformation on the word vector of the at least one word to obtain the word weight of the at least one word.

In particular, the encoder may be implemented by a variety of machine learning models, such as at least one of a trained BERT model, a Convolutional Neural Network (CNN) model, or a long-term memory model (LSTM). The trained encoder can respectively encode a plurality of words obtained by word segmentation into a plurality of word vectors, and has better performance. The training method for the above encoder will be described in detail later.

In the embodiments of the present specification, the purpose of the above-described linear transformation layer is to use a numerical value to represent the word vector of each word, and the numerical value may represent the importance of the word.

In some embodiments of the present disclosure, the linear transformation layer may be a coefficient matrix of 1×n or n×1; where N is the dimension of the word vector of the at least one word. In this case, the trained linear transformation layer may perform linear transformation on a word vector to obtain a real value, and may directly use the real value as the word weight of the word.

In other embodiments of the present disclosure, the linear transformation layer may further include a normalization unit, based on the coefficient matrix, for normalizing the real number value to obtain a word weight with a value range of [0,1]. For convenience of description, the above real values may be referred to as word weight values of words in embodiments of the present specification. The normalization unit may be implemented in various ways, for example, the word weight value corresponding to the at least one word may be normalized by using an S-type growth curve (Sigmoid). After the normalization processing, the range of the word weight of each word is [0,1]. Thus, the importance degree of the words can be better compared transversely, so that the word weight is more convenient to use in practical application.

In some embodiments of the present disclosure, when the word weight of the word is a real value and not subjected to normalization processing, a keyword threshold may be determined by using an average value of the word weights of the at least one word, for example, the keyword threshold may be an average value of the word weights of the at least one word. When the word weight of a word is greater than or equal to the keyword threshold, the word is determined to be a keyword.

In other embodiments of the present disclosure, a keyword threshold, for example, 0.5, may be predetermined when the word weight of the word is a value in the normalized range of [0,1 ]. When the word weight of a word is greater than or equal to the keyword threshold, the word is determined to be a keyword.

FIG. 4 illustrates a flow of a method for recalling at least one second text from a knowledge base based on at least one word, in accordance with some embodiments of the present specification. As shown in fig. 4, the method specifically may include:

determining a word weight of the at least one word according to a word frequency-inverse text frequency index (TF-IDF) algorithm at step 402;

at step 404, determining at least one keyword from the at least one word according to the word weight of the at least one word; and

At step 406, at least one second text is recalled from the knowledge base in accordance with the at least one keyword.

In the embodiments of the present description, it can be seen from fig. 3 and 4 described above that the word recall described above can be achieved by keyword matching. This is because the texts stored in the knowledge base are all pre-labeled with one or more keywords corresponding to the text, so in the

steps

306 and 406, a plurality of texts whose keywords match with at least one keyword of the first text can be found in the knowledge base through a plurality of keyword matching methods, the searched plurality of texts are scored according to a pre-set scoring policy, and the plurality of texts with the highest scores are returned to the client 102 as recall results (i.e., the second text). Note that the number of the returned second texts may be preset. The embodiment of the present disclosure does not limit the keyword matching method specifically used by the server 104. In the embodiment of the present specification, the above text recall mode is simply referred to as word recall.

It should be noted that, the keywords of the texts stored in the knowledge base may be labeled in advance, for example, manually or by other means, or may, of course, be determined in advance by the word weight model given in the present specification, and labeled for each text in the knowledge base.

According to the embodiment, after the word to be searched input by the user is segmented, the word weight of each word can be determined, the keyword is extracted from the segmented word according to the word weight of each word, namely, the unimportant word is removed, and finally, the text is searched in the knowledge base by utilizing the keyword, so that the disturbance of the unimportant word on the text search when the text is searched by using the word can be effectively removed, invalid text recall is reduced, the search result is more accurate, and the accuracy of the text search is improved. Moreover, because some embodiments of the present disclosure implement supervised word weight models that are utilized in determining the word weights of each word, the models are trained based on a large number of pre-labeled data sets, and therefore, the determined word weights are more accurate, thereby further improving the accuracy of text retrieval.

At step 206, the at least one term is input into a trained text vector model to obtain a text vector for the first text.

In embodiments of the present description, the above-described text vector model may be implemented by a variety of machine learning models, such as by at least one of a trained BERT model, a CNN model, or an LSTM model.

The trained text vector model can encode a plurality of words obtained by word segmentation into a text vector, and has better performance. The training method regarding the above text vector model will be described in detail later.

At step 208, at least one third text is recalled from the knowledge base in accordance with the vector of the first text.

In the embodiment of the present disclosure, the text stored in the knowledge base has already extracted the text vector and has previously established the vector index, so in step 208, vector search may be performed by using the vector index according to the determined vector of the first text to find a plurality of texts with text vectors matching the text vector of the first text from the text stored in the knowledge base, the searched plurality of texts are scored according to a preset scoring policy, and the plurality of texts with the highest scores are returned to the client 102 as recall results (i.e., the third text). Note that the number of the returned third texts may be preset. There are many kinds of vector search methods using vector indexes, and for example, it may be implemented using HNSW algorithm or using an elastic search engine, etc. The embodiment of the present specification does not limit the vector search method using vector indexes specifically used by the server 104. In the embodiments of the present disclosure, the text recall mode described in steps 206-208 above is simply referred to as vector recall.

It should be noted that, the steps 204 and the steps 206-208 may be two processes performed in parallel, and the order of the step numbers does not represent the execution sequence.

After the at least one second text and the at least one third text are obtained, the server 104 may further fuse the at least one second text and the at least one third text to obtain a text search result in step 210.

In the embodiments of the present description, the above fusion may also be achieved in a variety of ways.

In some embodiments of the present disclosure, the server 104 may combine, i.e., perform deduplication processing, the at least one second text and the at least one third text to obtain the text retrieval result.

In other embodiments of the present disclosure, the server 104 may implement the fusion using a fusion model, and the specific fusion process may be as shown in fig. 5, and includes:

at step 502, the at least one second text and the at least one third text are respectively input into a trained text vector model, and a text vector of the at least one second text and a text vector of the at least one third text are determined.

In an embodiment of the present disclosure, the text vector model may be the text vector model described in step 206.

In step 504, the text vector of the at least one second text is averaged to obtain an average vector of the second text, and the average vector of the second text is linearly transformed to obtain an average weight of the second text.

In step 506, the text vector of the at least one third text is averaged to obtain an average vector of the third text, and the average vector of the third text is linearly transformed to obtain an average weight of the third text.

In step 508, in response to the average weight of the second text being greater than or equal to the average weight of the third text, taking the at least one second text as a text search result; and responding to the fact that the average weight value of the second text is smaller than the average weight of the third text, and taking the at least one third text as a text retrieval result.

In the embodiment of the present specification, the operation of linearly transforming the average vector of the second text and the average vector of the third text may be implemented by a trained linear transformation model. The goal of training the linear transformation model is to make the average weight of a group of texts containing texts which are matched with the first text most larger than the average weight of other groups of texts, namely, the average weight of the group of texts containing the text of 'best answer', so that a group of texts with larger average weight can be selected as a search result to be output according to the average weights of the second text and the third text by the method, thereby ensuring that the group of texts contains the text which is matched with the first text most, namely, the text contains the text search 'best answer', and ensuring that a user can obtain the best text search result.

Therefore, the above embodiment of the present specification can combine the text retrieval mode of word recall with the text retrieval mode of vector recall, so that the text related to the word level can be retrieved, and the text related to the semantic level can be retrieved, that is, the text recall of the semantic level is added on the basis of the text recall of the word level, so that the text retrieval result is more comprehensive, and the recall rate of the text retrieval is improved.

Further, as described above, by the fusion model, it is ensured that the search result includes the text that matches the first text most, that is, includes the "best answer" of the text search.

The training method of each model described above is described in detail below in connection with specific examples.

FIG. 6 shows a method flow for training the word weight model described in embodiments of the present specification. As shown in fig. 6, the method may include:

at step 602, training data is obtained, the training data comprising a plurality of training texts and a known output corresponding to each training text, wherein each training text comprises at least one second word; the known output is a word weight of the at least one second word.

In an embodiment of the present disclosure, the second word included in each training text is labeled with an importance tag, and the importance tag identifies an importance level of the word. At this time, the above known output may be specifically: the importance of the at least one second word.

The following steps 604-610 are performed for each training text:

at step 604, inputting at least one second word included in the training text into the encoder, and generating a word vector of the at least one second word according to a current value of a parameter of the encoder;

inputting the word vector of the at least one second word into the linear transformation layer, and generating word weights of the at least one second word according to the current value of the parameter of the linear transformation layer as the prediction output of the training text in step 606;

determining a gradient based on an error between the predicted output and the known output of the training text at step 608; and

at step 610, the gradient is counter-propagated to the encoder and the linear transformation layer to jointly adjust current values of parameters of the encoder and the linear transformation layer.

As previously mentioned, the known outputs may be specifically: the importance degree of the at least one second word is determined by the model parameter, so that when the predicted output and the known output are not consistent, the predicted output is regarded as a punishment of the training result, and the current value of the model parameter is adjusted accordingly; and when the predicted output accords with the known output, the training result is regarded as one time 'reward', and the current value of the model parameter is adjusted accordingly.

The training process may be ended when the training reaches a predetermined number of times or the model converges.

Fig. 7 shows an internal structure of the word weight model according to the embodiment of the present specification. As shown in fig. 7, the word weight model may include:

an input layer 702 for receiving at least one word and its corresponding known output. For example, in fig. 6, the at least one term includes: "My", "money", "lost", "the", "what to do", and "woolen" words, where "money" and "lost" are words labeled as important, with higher word weights. The words labeled as important are represented in fig. 6 using shaded boxes.

An encoder 704 for encoding the at least one word and outputting a word vector of the at least one word. For example, the word vectors 1 to 6 shown in fig. 7.

And the linear transformation layer 706 is configured to perform linear transformation on the word vector of the at least one word to obtain a word weight of the at least one word, and output the word weight as a prediction.

For example, the word weights 1 to 6 shown in fig. 7.

A comparison layer 708 for determining a gradient based on the error between the predicted output and the known output; and back propagates the gradients to the encoder 704 and the linear transformation layer 706 to adjust the current values of the parameters of the encoder 704 and the linear transformation layer 706. For example, in FIG. 7, the word weights for the known outputs as the words "money" and "lost" should be greater than "My", "the", "what" and "what", and therefore the comparison layer 708 will compare the predicted output to the known output and jointly adjust the encoder 704 and the linear transformation layer 706 based on the error between the two.

As previously described, the encoder 704 may be implemented by at least one of a BERT model, a CNN model, and an LSTM model. The linear transformation layer 706 may be embodied as a coefficient matrix of 1×n or n×1, where N is the dimension of the word vector. Alternatively, the linear transformation layer 706 may specifically include a coefficient matrix of 1×n or n×1 and a normalization unit.

It can be seen that, after training is completed, the word weight output by the word weight model can accurately represent the importance degree of the word.

Fig. 8 shows a method flow for training the text vector model according to the embodiments of the present disclosure. As shown in fig. 8, the method may include:

at step 802, second training data is obtained, the second training data comprising a plurality of sets of training text pairs and a corresponding known output for each training text pair.

In the embodiment of the present specification, each training text pair contains a matching degree label of the training text pair. At this time, the above known output may be specifically: the matching degree of the training text pair output by the model is consistent with the matching degree label of the training text pair. That is, when the matching degree label identifies that the training text pair is matched, the matching degree of the training text pair output by the model should be greater than a predetermined threshold value; and the matching degree label identifies that the training text pair is not matched, the matching degree of the training text pair output by the model should be less than or equal to a predetermined threshold value.

For each training text pair, the following steps 804-810 are performed, respectively:

in step 804, inputting the first training text and the second training text in the training text pair into a text vector model respectively, and generating a first training text vector corresponding to the first training text and a second training text vector corresponding to the second training text according to the current value of the parameter of the text vector model;

determining the relatedness degree of the first training text and the second training text according to the first training text vector and the second training text vector as the prediction output of the training text pair in step 806;

determining a gradient based on an error between the predicted output and the known output of the training text pair at step 808; and

at step 810, the gradient is back-propagated to the text vector model to adjust the current values of the parameters of the text vector model.

As previously mentioned, the known outputs may be specifically: the matching degree of the training text pair output by the model is consistent with the matching degree label of the training text pair, so that when the predicted output is inconsistent with the known output, the training result is regarded as one punishment, and the current value of the model parameter is adjusted accordingly; and when the predicted output accords with the known output, the training result is regarded as one time 'reward', and the current value of the model parameter is adjusted accordingly.

Fig. 9 shows an internal structure of the text vector model according to the embodiment of the present specification. As shown in fig. 9, the text vector model may include:

a second input layer 902 for receiving a text pair and its corresponding known output. For example, in FIG. 9, the first text in the text pair is "what you have lost" the my money; the second text is "how she has lost her bag": and, the matching degree label of the text pair is not matched.

And a second encoder 904 for encoding the texts in the text pairs, respectively, and outputting two text vectors. For example, text vector 1 and text vector 2 shown in fig. 9.

And a matching degree calculating layer 906, configured to determine the matching degree of the two text vectors, and output the matching degree as a prediction.

A second comparison layer 908 for determining a gradient based on the error between the predicted output and the known output; and back-propagating the gradient to the encoder to adjust the current value of the encoder's parameter.

For example, in FIG. 9, the known output is not a match of text versus "how My money lost" and "how her school bag lost" and, therefore, the second comparison layer 908 compares the predicted output to the known output and jointly adjusts the second encoder 904 based on the error therebetween.

As previously described, the above-described second encoder 904 may be implemented by at least one of the BERT model, the CNN model, and the LSTM model.

It can be seen that the text vector model can encode a text into a text vector after training is completed, and the text vector can accurately express the semantics of the text.

FIG. 10 shows another method flow for training a text vector model as described herein. As shown in fig. 10, the method may include:

in step 1002, second training data is obtained, where the second training data includes a plurality of sets of training text pairs and a known output corresponding to each training text pair;

for each training text pair, the following steps 1004-1010 are performed:

in step 1004, inputting the first training text and the second training text in the training text pair into a BERT model respectively, and generating a first training text vector corresponding to the first training text and a first training text vector corresponding to the second training text according to the current value of the parameter of the BERT model;

in step 1006, determining the relatedness degree of the text vector of the first training text and the text vector of the second training text according to the first training text vector and the second training text vector, and outputting the relatedness degree as the prediction of the training text pair;

Determining a gradient based on an error between the predicted output and the known output of the training text pair at step 1008;

at step 1010, the gradient is back-propagated to the BERT model to adjust the current values of parameters of the text vector model.

Specific implementations of steps 1002-1010 above may refer to steps 802-810 above.

At step 1012, after the BERT model training is completed, a text vector model is trained using a model distillation method according to the trained BERT model.

In embodiments of the present description, the text vector model described above may be implemented by a relatively simple machine learning model such as a CNN model or an LSTM model.

The reason for using model distillation is that the BERT model structure is complex, relatively large resources are consumed in the use process, and the calculation time is long. In order to quickly realize the vectorization of the text in the application, the method can only apply the BERT model to carry out model training, and quickly learn the BERT model which has completed training by using a CNN model or an LSTM model which is relatively simple in structure and has higher operation speed after the BERT model training is completed through model distillation. In practical application, a CNN model or an LSTM model trained in a model distillation mode is used for encoding an input text to obtain a text vector, so that the effect of the BERT model can be achieved. Specifically, in the model distillation process, the same text can be input into the CNN model or the LSTM model and the trained BERT model, and then the parameters of the CNN model or the LSTM model are adjusted by utilizing the error between the vector output by the CNN model or the LSTM model and the vector output by the BERT model in a distillation learning mode, so that the training of the CNN model or the LSTM model is completed.

Fig. 11 illustrates an internal structure of a text vector model according to an embodiment of the present specification. As shown in fig. 11, in addition to the above-described second input layer 902, second encoder 904, matching degree calculation layer 906, and second comparison layer 908, the text vector model may further include: a text vector model 910 for learning the encoder 904 by means of model distillation. As previously described, the second encoder 904 may be a BERT model, and the text vector model 910 may be a CNN model or an LSTM model.

In an embodiment of the present disclosure, to train the text vector model 910 by way of model distillation, the same text may be input to the text vector model 910 and the trained BERT model, e.g., "how my money has been lost"; then, by the distillation learning method, parameters of the text vector model 910 are adjusted according to an error between the text vector (e.g., text vector 3 in fig. 11) output by the text vector model 910 and the text vector (e.g., text vector 2 in fig. 11) output by the BERT model, thereby completing training of the text vector model 910.

FIG. 12 shows a process flow for training the fusion model described in this specification. As shown in fig. 12, the method may include:

In step 1202, third training data is obtained, the third training data including a plurality of pairs of training text sets and a known output corresponding to each pair of training text sets;

in an embodiment of the present disclosure, each of the training text sets includes at least two training text sets: a first training text set and a second training text set, and one text in one of the training text sets is labeled with a label. At this time, the above known output may be specifically: the average weight of the training text group containing the label text output by the model should be greater than the average weight of the other training text groups.

For each pair of training text sets, the following steps 1204-1214 are performed:

in step 1204, a trained text vector model is input into the first training text set and the second training text set, respectively, and a text vector of each training text in the first training text set and a text vector of each training text in the second training text set are determined;

in step 1206, the text vector of each training text in the first training text group is averaged to obtain an average vector of the first training text group;

in step 1208, the text vector of each training text in the second training text group is averaged to obtain an average vector of the second training text group;

In step 1210, the average vector of the first training text set and the average vector of the second training text set are respectively input into a linear transformation model, and an average weight of the first training text set and an average weight of the second training text set are generated according to the current value of the parameter of the linear transformation model and are used as the prediction output of the pair of training text sets;

determining a gradient based on the error between the predicted output and the known output at step 1212;

at step 1214, the gradient is back-propagated to the linear transformation model to adjust the current values of the parameters of the linear transformation model.

As previously mentioned, the known outputs may be specifically: the average weight of the training text group which contains the label text and is output by the model should be larger than the average weight of other training text groups, therefore, when the predicted output is not consistent with the known output, the training result is regarded as a punishment, and the current value of the model parameter is adjusted according to the punishment; and when the predicted output accords with the known output, the training result is regarded as one time 'reward', and the current value of the model parameter is adjusted accordingly.

Fig. 13 shows an internal structure of the fusion model according to the embodiment of the present specification. As shown in fig. 13, the fusion model may include:

a third input layer 1302 for receiving two text groups and their corresponding known outputs. For example, in fig. 13, the first text group in the text groups is m text called-up words; the second text group is n texts recalled by vectors, and one text in the first text group is marked with a label.

The text vector model 1304 is configured to encode texts in the two text groups, and output two text vectors.

And an averaging layer 1306, configured to average the two sets of text vectors to obtain a first set of text average text vectors and a second set of text average text vectors.

A second linear coding model 1308, configured to perform linear transformation on the determined first set of text average text vectors and the determined second set of text average text vectors to obtain average weights of the two text sets as prediction output.

A third comparison layer 1310 for determining a gradient based on an error between the predicted output and a known output; and back-propagates the gradient to the current values of the parameters of the second linear encoding model 1308. For example, in fig. 13, the known output is: the average weight of the first text set is greater than the average weight of the second text set, so the third comparison layer 1310 compares the predicted output with the known output and adjusts the second linear encoding model 1308 based on the combination of the errors between the two.

It can be seen that after training is completed, the fusion model selects a group of recall results containing the "best answer" from the plurality of groups of recall results.

Corresponding to the above method, the embodiment of the present specification also provides a text searching device, and the internal structure of the device is as shown in fig. 14, and may include:

a word segmentation module 1402, configured to segment the received first text to obtain at least one word;

a word recall module 1404 for recalling at least one second text from the knowledge base based on the at least one word;

a text vector generation module 1406 for inputting the at least one term into a trained text vector model to obtain a text vector for the first text;

a vector recall module 1408 for recalling at least one third text from the knowledge base based on the vector of the first text; and

and the fusion module 1410 is configured to fuse the at least one second text and the at least one third text to obtain a text search result.

In some embodiments of the present description, the term recall module includes:

In an embodiment of the present specification, the word weight model includes:

an encoder and a linear transform layer; wherein,,

In other embodiments of the present description, the term recall module includes:

In some embodiments of the present disclosure, the fusion module includes:

In other embodiments of the present disclosure, the fusion module includes:

Further, in the embodiments of the present specification, the above-described text search apparatus may be regarded as one electronic device, and therefore, the text search apparatus may include: memory 1400, processor 1200, input/output interface 1600, communication interface 1800, and bus 2000. Wherein the processor 1200, the memory 1400, the input/output interface 1600, and the communication interface 1800 enable communication connections among each other within the device via the bus 2000.

The Memory 1300 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 1400 may store an operating system and other application programs, and may also store various modules of a server provided by the embodiments of the present specification, such as word segmentation module 1402, word recall module 1404, text vector generation module 1406, vector recall module 1408, and fusion module 1410 described above, and when the technical solutions provided by the embodiments of the present specification are implemented by software or firmware, relevant program codes are stored in memory 1400 and invoked for execution by processor 1200.

The processor 1200 may be implemented by a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The input/output interface 1600 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The communication interface 1800 is used to connect communication modules (not shown) for enabling communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 2000 includes a path to transfer information between elements of the device (e.g., processor 1200, memory 1400, input/output interface 1600, and communication interface 1800).

It should be noted that although the above-described device only shows the processor 1200, the memory 1400, the input/output interface 1600, the communication interface 1800, and the bus 2000, in the implementation, the device may include other components necessary to achieve normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the electronic device embodiments as well as the computer storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; combinations of features in the above embodiments or in different embodiments are also possible within the idea of the present description, steps may be implemented in any order, and many other variations of the different aspects of the present description as described above exist, which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, for simplicity of illustration and discussion, and so as not to obscure the description. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the present description, and also in view of the fact that specifics with respect to implementations of such block diagram devices are highly dependent upon the platform within which the present description is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the specification, it should be apparent to one skilled in the art that the specification can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the present specification has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The embodiments of the present specification are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements and the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the present description.

Claims

1. A text retrieval method, the method comprising:

word segmentation is carried out on the received first text to obtain at least one word;

carrying out word recall in a knowledge base according to the at least one word, and recalling at least one second text;

inputting the at least one word into a trained text vector model to obtain a text vector of the first text;

carrying out vector recall in the knowledge base according to the vector of the first text, and recalling at least one third text; and

and fusing the at least one second text and the at least one third text to obtain a text retrieval result.

2. The method of claim 1, wherein recalling at least one second text from the knowledge base in accordance with the at least one term comprises:

determining word weights of the at least one word respectively;

determining at least one keyword from the at least one word according to the word weight of the at least one word; and

and recalling at least one second text from the knowledge base according to the at least one keyword.

3. The method of claim 2, wherein determining the word weights of the at least one word separately comprises: and respectively inputting the at least one word into a trained word weight model to obtain the word weight of the at least one word.

4. The method of claim 3, the word weight model comprising:

an encoder and a linear transform layer; wherein,,

5. The method of claim 2, wherein determining the word weights of the at least one word separately comprises: and determining the word weight of the at least one word according to a word frequency-inverse text frequency index TF-IDF algorithm.

6. The method of claim 1, the fusing the at least one second text and the at least one third text comprising:

and merging the at least one second text and the at least one third text to obtain the text retrieval result.

7. The method of claim 1, the fusing the at least one second text and the at least one third text comprising:

inputting the at least one second text and the at least one third text into a trained text vector model, respectively, and determining a text vector of the at least one second text and a text vector of the at least one third text;

Averaging the text vectors of the at least one second text to obtain an average vector of the second text, and performing linear transformation on the average vector of the second text to obtain an average weight of the second text;

averaging the text vectors of the at least one third text to obtain an average vector of the third text, and performing linear transformation on the average vector of the third text to obtain an average weight of the third text;

determining that the at least one second text is the text retrieval result in response to the average weight of the second text being greater than or equal to the average weight of the third text; and

and determining that the at least one third text is the text retrieval result in response to the average weight value of the second text being smaller than the average weight of the third text.

8. The method of claim 3, wherein the training method of the word weight model comprises:

For each of the training texts,

inputting the at least one second word into an encoder, and generating a word vector of the at least one second word according to the current value of the parameter of the encoder;

inputting the word vector of the at least one second word into a linear transformation layer, generating word weights of the at least one second word according to the current value of the parameter of the linear transformation layer, and taking the word weights of the at least one second word as the prediction output of the training text;

determining a gradient based on an error between a predicted output and a known output of the training text;

the gradient is counter-propagated to the encoder and the linear transformation layer to jointly adjust current values of parameters of the encoder and the linear transformation layer.

9. The method of claim 1, wherein the training method of the text vector model comprises:

For each pair of training texts,

respectively inputting a first training text and a second training text of the training text pair into the text vector model, and generating a first training text vector corresponding to the first training text and a second training text vector corresponding to the second training text according to the current value of the parameter of the text vector model;

determining the matching degree of the first training text and the second training text according to the first training text vector and the second training text vector, and taking the matching degree as the prediction output of the training text pair;

determining a gradient based on an error between a predicted output and a known output of the training text pair;

the gradient is back-propagated to the text vector model to adjust the current values of parameters of the text vector model.

10. The method of claim 1, wherein the training method of the text vector model comprises:

For each pair of training texts,

respectively inputting a first training text and a second training text in the training text pair into a BERT model, and generating a first training text vector corresponding to the first training text and a first training text vector corresponding to the second training text according to the current value of the parameter of the BERT model;

back-propagating the gradient to the BERT model to adjust current values of parameters of the BERT model; and

11. The method of claim 10, the text vector model comprising a CNN model or an LSTM model.

12. A text retrieval apparatus, the apparatus comprising:

The word recall module is used for carrying out word recall in the knowledge base according to the at least one word and recalling at least one second text;

the vector recall module is used for carrying out vector recall in the knowledge base according to the vector of the first text and recalling at least one third text; and

13. The apparatus of claim 12, the word recall module comprising:

14. The apparatus of claim 13, the word weight model comprising:

An encoder and a linear transform layer; wherein,,

15. The apparatus of claim 12, the word recall module comprising:

16. The apparatus of claim 12, the fusion module comprising:

17. The apparatus of claim 12, the fusion module comprising:

18. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1 to 11 when the program is executed.

19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 11.