CN111274808A

CN111274808A - Text retrieval method, model training method, text retrieval device, and storage medium

Info

Publication number: CN111274808A
Application number: CN202010086368.XA
Authority: CN
Inventors: 陈晓军; 崔恒斌
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-02-11
Filing date: 2020-02-11
Publication date: 2020-06-12
Anticipated expiration: 2040-02-11
Also published as: CN111274808B

Abstract

The present specification relates to a text retrieval method including: performing word segmentation on the received first text to obtain at least one word; recalling at least one second text from a knowledge base according to the at least one word; inputting the at least one word into a trained text vector model to obtain a text vector of the first text; recalling at least one third text from the knowledge base according to the vector of the first text; and fusing the at least one second text and the at least one third text to obtain a text retrieval result. The specification also provides a training method of the word weight model and the text vector model, a text retrieval device, electronic equipment and a computer readable storage medium.

Description

Text retrieval method, model training method, text retrieval device, and storage medium

Technical Field

The present disclosure relates to the field of natural language processing technologies, and in particular, to a text retrieval method, a model training method, a text retrieval device, an electronic device, and a computer-readable storage medium.

Background

Text Retrieval (Text Retrieval), also known as natural language Retrieval, refers to the process of retrieving, classifying, filtering and the like a Text set according to the Text content, such as words and semantics contained in the Text. Text retrieval and image retrieval, voice retrieval, picture retrieval, and the like are all part of information retrieval. Generally, the result of text retrieval can be measured by two basic indexes, namely accuracy and recall. Wherein, the accuracy rate generally refers to the ratio of the retrieved related documents divided by all the retrieved documents; recall, also known as recall, generally refers to the ratio of the number of relevant documents retrieved to the total number of relevant documents. Therefore, how to improve the accuracy rate or recall rate of text retrieval is a key problem to be solved by text retrieval.

Disclosure of Invention

In view of this, an embodiment of the present specification provides a text retrieval method, which may include: performing word segmentation on the received first text to obtain at least one word; recalling at least one second text from a knowledge base according to the at least one word; inputting the at least one word into a trained text vector model to obtain a text vector of the first text; recalling at least one third text from the knowledge base according to the vector of the first text; and fusing the at least one second text and the at least one third text to obtain a text retrieval result.

In an embodiment of the present specification, the recalling at least one second text from the knowledge base according to the at least one word may include: determining word weights of the at least one word respectively; determining at least one keyword from the at least one word according to the word weight of the at least one word; and recalling at least one second text from the knowledge base according to the at least one keyword.

In an embodiment of the present specification, the determining the word weight of the at least one word respectively may include: and respectively inputting the at least one word into the trained word weight model to obtain the word weight of the at least one word.

In an embodiment of the present specification, the word weight model may include: an encoder and a linear transform layer; the encoder encodes the at least one word respectively to obtain a word vector of the at least one word; and the linear transformation layer respectively carries out linear transformation on the word vector of the at least one word to obtain the word weight of the at least one word.

In an embodiment of the present specification, the determining the word weight of each of the at least one word may include: determining a word weight of the at least one word according to a word frequency-inverse text frequency index, TF-IDF, algorithm.

In an embodiment of the present specification, the fusing the at least one second text and the at least one third text may include: and merging the at least one second text and the at least one third text to obtain the text retrieval result.

In an embodiment of the present specification, the fusing the at least one second text and the at least one third text may include: inputting the at least one second text and the at least one third text into the trained text vector model respectively, and determining a text vector of the at least one second text and a text vector of the at least one third text; averaging the text vectors of the at least one second text to obtain an average vector of the second text, and performing linear transformation on the average vector of the second text to obtain an average weight of the second text; averaging the text vectors of the at least one third text to obtain an average vector of the third text, and performing linear transformation on the average vector of the third text to obtain an average weight of the third text; determining the at least one second text as the text retrieval result in response to the average weight of the second text being greater than or equal to the average weight of the third text; and determining the at least one third text as the text retrieval result in response to the average weight value of the second text being smaller than the average weight of the third text.

Embodiments of the present description propose a method of training a word weight model, which may include:

acquiring training data, wherein the training data comprises a plurality of training texts and known output corresponding to each training text; wherein each training text comprises at least one second word; the known output is a word weight of the at least one second word;

for each training text, inputting at least one second word obtained by segmenting the training text into words into the encoder, and generating a word vector of the at least one second word according to the current value of the parameter of the encoder; inputting the word vector of the at least one second word into a linear transformation layer, generating a word weight of the at least one second word according to the current value of the parameter of the linear transformation layer, and taking the word weight of the at least one second word as the prediction output of the training text; determining a gradient based on an error between a predicted output and a known output of the training text; back-propagating the gradient to the encoder and the linear transform layer to jointly adjust current values of parameters of the encoder and the linear transform layer.

An embodiment of the present specification provides a method for training a text vector model, which may include:

acquiring second training data, wherein the second training data comprises a plurality of groups of training text pairs and known outputs corresponding to the training text pairs; wherein each training text pair comprises a first training text and a second training text; the known output is the matching degree of the first training text and the second training text;

respectively inputting a first training text and a second training text of a training text pair into a text vector model aiming at each training text pair, and generating a first training text vector corresponding to the first training text and a second training text vector corresponding to the second training text according to the current values of the parameters of the text vector model; determining the matching degree of the first training text and the second training text according to the first training text vector and the second training text vector, and taking the matching degree as the prediction output of the training text pair; determining a gradient based on an error between a predicted output and a known output of the training text pair; propagating the gradient back to the text vector model to adjust current values of parameters of the text vector model.

respectively inputting a first training text and a second training text in the training text pairs into a BERT model aiming at each training text pair, and generating a first training text vector corresponding to the first training text and a first training text vector corresponding to the second training text according to the current values of parameters of the BERT model; determining the matching degree of the first training text and the second training text according to the first training text vector and the second training text vector, and taking the matching degree as the prediction output of the training text pair; determining a gradient based on an error between a predicted output and a known output of the training text pair; back-propagating the gradient to the BERT model to adjust current values of parameters of the BERT model;

after the training of the BERT model is completed, the text vector model is trained using model distillation according to the trained BERT model.

In an embodiment of the present specification, the text vector model may include a CNN model or an LSTM model.

An embodiment of the present specification provides a text retrieval apparatus, which may include:

the word segmentation module is used for segmenting the received first text to obtain at least one word;

the word recalling module is used for recalling at least one second text from the knowledge base according to the at least one word;

the text vector generation module is used for inputting the at least one word into a trained text vector model to obtain a text vector of the first text;

the vector recalling module is used for recalling at least one third text from the knowledge base according to the vector of the first text; and

and the fusion module is used for fusing the at least one second text and the at least one third text to obtain a text retrieval result.

In an embodiment of the present specification, the word recall module includes:

the word weight model is used for inputting the at least one word into the trained word weight model respectively to obtain the word weight of the at least one word;

a keyword determining unit, configured to determine at least one keyword from the at least one word according to a word weight of the at least one word; and

and the word recalling unit is used for recalling at least one second text from the knowledge base according to the at least one keyword.

In an embodiment of the present specification, the word weight model includes:

an encoder and a linear transform layer; wherein,

the encoder is used for encoding the at least one word respectively to obtain a word vector of the at least one word;

and the linear transformation layer respectively carries out linear transformation on the word vector of the at least one word to obtain the word weight of the at least one word.

In an embodiment of the present specification, the word recall module includes:

a word weight determination unit for determining a word weight of the at least one word according to a word frequency-inverse text frequency index TF-IDF algorithm;

In an embodiment of the present specification, the fusion module includes:

and the union unit is used for solving a union set of the at least one second text and the at least one third text to obtain the text retrieval result.

In an embodiment of the present specification, the fusion module includes:

the trained text vector model is used for respectively coding the at least one second text and the at least one third text and determining a text vector of the at least one second text and a text vector of the at least one third text;

the text average weight determining module is used for averaging the text vectors of the at least one second text to obtain an average vector of the second text, and performing linear transformation on the average vector of the second text to obtain an average weight of the second text; averaging the text vectors of the at least one third text to obtain an average vector of the third text, and performing linear transformation on the average vector of the third text to obtain an average weight of the third text;

a document retrieval result determination unit, configured to determine the at least one second text as the text retrieval result in response to that the average weight of the second text is greater than or equal to the average weight of the third text; and determining the at least one third text as the text retrieval result in response to the average weight value of the second text being smaller than the average weight of the third text.

Embodiments of the present specification also provide an electronic device, which may include: memory, processor and computer program stored on the memory and executable on the processor, which when executed by the processor implements the above method.

Embodiments of the present specification also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method implementing the above.

Therefore, by the text retrieval method and the text retrieval device, after the text to be retrieved input by the user is segmented, on one hand, the word recall is carried out according to the segmented words, on the other hand, the vector of the text to be retrieved is determined and the vector recall is carried out, so that the text retrieval mode of the word recall is combined with the text retrieval mode of the vector recall, on the basis of retrieving the related text at the word level, the related text at the semantic level can be retrieved, namely, the text recall at the semantic level is added on the basis of retrieving the text at the word level, the text retrieval result is more comprehensive, and the text retrieval recall rate is improved.

In addition, in the embodiment of the present specification, in the process of retrieving words, the word weight of each word may be further determined, and keywords are extracted from the words after word segmentation according to the word weight of each word, that is, unimportant words are removed, and finally, text retrieval is performed in the knowledge base by using the keywords, so that interference of unimportant words on text retrieval when using words to retrieve text can be effectively removed, invalid text retrieval is reduced, a retrieval result is more accurate, and accuracy of text retrieval is improved. In addition, because the word weight of each word is determined by using the supervised word weight models, which are obtained by training based on a large number of pre-labeled data sets, the determined word weight is more accurate, and the accuracy of text retrieval is further improved.

Further, in the embodiment of the present specification, the recall results of the word recall and the vector recall may be fused through a fusion model, and it is ensured that the search result includes a text that most matches the first text input by the user, that is, includes a "best answer" for text search.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of a text retrieval system 100 according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a text retrieval method according to some embodiments of the present disclosure;

FIG. 3 is a schematic flow chart illustrating recalling at least one second text from a knowledge base according to at least one word in accordance with some embodiments of the present description;

FIG. 4 is a flow diagram illustrating another embodiment of the present description recalling at least one second text from the knowledge base based on at least one term;

FIG. 5 is a schematic flow chart illustrating a method for fusing at least one second text and at least one third text according to some embodiments of the present disclosure;

FIG. 6 is a flowchart of a method for training a word weight model according to an embodiment of the present disclosure;

FIG. 7 illustrates an internal structure of a word weight model according to an embodiment of the present description;

FIG. 8 is a flowchart illustrating a method for training a text vector model according to an embodiment of the present disclosure;

FIG. 9 illustrates an internal structure of a text vector model according to an embodiment of the present description;

FIG. 10 is a flow chart of another method for training a text vector model according to an embodiment of the present disclosure;

FIG. 11 illustrates an internal structure of a text vector model according to an embodiment of the present description;

FIG. 12 is a flowchart of a method for training a fusion model according to an embodiment of the present disclosure;

FIG. 13 is a view showing an internal structure of a fusion model according to an embodiment of the present disclosure;

fig. 14 shows an internal structure of a text retrieval device according to an embodiment of the present specification.

Detailed Description

To make the objects, technical solutions and advantages of the present specification more apparent, the present specification is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.

It should be noted that technical terms or scientific terms used in the embodiments of the present specification should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

Fig. 1 shows a structure of a text retrieval system 100 according to an embodiment of the present specification. As shown in fig. 1, the text retrieval system 100 may include: at least one client 102, a server (also referred to as a text retrieval device) 104, and a knowledge base 106.

The client 102 is configured to provide a user interface for a user, receive a text to be retrieved input by the user, forward the text to be retrieved to the server 104, and feed back a retrieval result for the text to be retrieved, which is received from the server 104, to the user.

The server 104 is configured to receive a text to be retrieved input by a user from the client 102, perform a series of processing on the text to be retrieved, recall a certain number of texts from the knowledge base 106 according to a processing result, determine a retrieval result for the text to be retrieved from the texts, and return the determined retrieval result to the client 102.

The knowledge base 106 is used to store a large amount of text that is preset. In general, knowledge base 106 may be viewed as a database or data set that stores text, which may be viewed as a scope of text retrieval. The scope of this search may be the same for all users, i.e., different users may correspond to the same knowledge base. In addition, in some embodiments of the present specification, because different users have different attention degrees for different types of information, different knowledge bases may also be set for different users, that is, for the same text to be retrieved, texts retrieved from the respective knowledge bases by different users may be different. In the embodiments of the present specification, this range of text retrieval by a certain user is referred to as a knowledge base.

Fig. 2 is a flow chart of a text retrieval method according to some embodiments of the present disclosure. The method may be performed by the server 104 of fig. 1. As shown in fig. 2, the method may include:

in step 202, the received first text is segmented to obtain at least one word.

In an embodiment of the present specification, the first text may be a text to be retrieved, which is input by the user through the client 102. The first text may be, for example, a question or a sentence, etc. After receiving the first text, the client 102 sends the first text to the server 104 for text retrieval within a range of a preset knowledge base.

In the embodiments of the present specification, the above-mentioned first text may be segmented by using various methods, for example, a dictionary-based segmentation method, a statistical-based segmentation method, a rule-based segmentation method, a word tagging-based segmentation method, an understanding-based segmentation method, and the like. The text retrieval scheme described in the embodiments of the present specification does not limit the specific word segmentation method used.

At step 204, at least one second text is recalled from the knowledge base based on the at least one term.

In embodiments of the present specification, the recalling of the at least one second text from the knowledge base according to the at least one word in step 204 above may be implemented in various ways. Specifically, the word weight of the at least one word may be determined first; then, determining the keywords of the first text according to the determined word weight; and finally, recalling the text by using the determined keywords.

FIG. 3 illustrates a flow of a method for recalling at least one second text from a knowledge base based on at least one term in accordance with some embodiments of the present description. As shown in fig. 3, the method may specifically include:

in step 302, the at least one word is respectively input into the trained word weight model to obtain the word weight of the at least one word;

at step 304, determining at least one keyword from the at least one word according to the word weight of the at least one word; and

at step 306, at least one second text is recalled from the knowledge base based on the at least one keyword.

In an embodiment of the present specification, the word weight model may include: an encoder and a linear transform layer; the encoder is used for encoding the at least one word respectively to obtain a word vector of the at least one word; and the linear transformation layer respectively carries out linear transformation on the word vector of the at least one word to obtain the word weight of the at least one word.

In particular, the encoder may be implemented by a variety of machine learning models, such as at least one of a trained BERT model, a Convolutional Neural Network (CNN) model, or a Long Short Term Memory (LSTM) model. The trained encoder can encode a plurality of words obtained by word segmentation into a plurality of word vectors respectively, and has better performance. The training method for the above-described encoder will be described in detail later.

In the embodiments of the present specification, the linear transformation layer is provided to use a value to characterize the word vector of each word, and the value can characterize the importance of the word.

In some embodiments of the present description, the linear transform layer may be a 1 × N or N × 1 coefficient matrix; wherein N is a dimension of a word vector of the at least one word. In this case, the trained linear transformation layer may perform linear transformation on a word vector to obtain a real value, and may directly use the real value as the word weight of the word.

In other embodiments of the present disclosure, on the basis of the coefficient matrix, the linear transformation layer may further include a normalization unit, configured to normalize the real value to obtain a word weight with a value range of [0,1 ]. For convenience of description, the above-described real value may be referred to as a word weight value of a word in an embodiment of the present specification. The normalization unit may be implemented in various ways, for example, a word weight value corresponding to the at least one word may be normalized by using a growth curve of S type (Sigmoid). After the normalization processing, the value range of the word weight of each word is [0,1 ]. Thus, the importance of the words can be compared better in the horizontal direction, and therefore, the word weight is more convenient to use in practical application.

In some embodiments of the present disclosure, when the word weight of the word is a real value without being normalized, the average value of the word weight of the at least one word may be used to determine a keyword threshold, for example, the keyword threshold may be the average value of the word weight of the at least one word. When the word weight of a word is greater than or equal to the keyword threshold, the word is determined to be a keyword.

In other embodiments of the present description, when the word weight of a word is a normalized value with a value range of [0,1], a keyword threshold value, for example, 0.5, may be preset. When the word weight of a word is greater than or equal to the keyword threshold, the word is determined to be a keyword.

FIG. 4 illustrates a flow of a method for recalling at least one second text from a knowledge base based on at least one term in accordance with some embodiments of the present description. As shown in fig. 4, the method may specifically include:

in step 402, determining a word weight of the at least one word according to a word frequency-inverse text frequency index (TF-IDF) algorithm;

at step 404, determining at least one keyword from the at least one word according to the word weight of the at least one word; and

at step 406, at least one second text is recalled from the knowledge base based on the at least one keyword.

In the embodiment of the present specification, as can be seen from fig. 3 and 4 described above, the above word recall may be implemented by keyword matching. This is because, the texts stored in the knowledge base are all labeled with one or more keywords corresponding to the text in advance, so in the

above steps

306 and 406, a plurality of texts in which the keywords match with at least one keyword of the first text can be found in the knowledge base through a plurality of keyword matching methods, the searched plurality of texts are scored according to a preset scoring policy, and the plurality of texts with the highest scores are returned to the client 102 as a recall result (i.e., the second text). The number of the second texts to be returned may be set in advance. The embodiment of the present specification does not limit the keyword matching method specifically used by the server 104. In the embodiments of the present specification, the text recall manner is simply referred to as word recall.

It should be noted that the keywords of the text stored in the knowledge base may be labeled in advance, for example, manually or by other means, or certainly, the keywords of each text in the knowledge base may be predetermined and labeled through the word weight model given in this specification.

It can be seen from the above embodiments that, by the above word recall method, after the text to be retrieved input by the user is segmented, the word weight of each word is determined, and the keywords are extracted from the segmented words according to the word weight of each word, that is, the unimportant words are removed, and finally the keywords are used to perform text retrieval in the knowledge base, so that the interference of the unimportant words on the text retrieval when the words are used to retrieve the text can be effectively removed, the invalid text recall is reduced, the retrieval result is more accurate, and the accuracy of the text retrieval is improved. In addition, because some embodiments of the present description are implemented by using supervised word weight models when determining the word weight of each word, which are trained based on a large number of pre-labeled data sets, the determined word weight is more accurate, thereby further improving the accuracy of text retrieval.

In step 206, the at least one word is input into the trained text vector model to obtain a text vector of the first text.

In embodiments of the present description, the text vector model described above may be implemented by a variety of machine learning models, such as at least one of a trained BERT model, CNN model, or LSTM model.

The trained text vector model can encode a plurality of words obtained by word segmentation into a text vector and has better performance. The training method for the text vector model will be described in detail later.

At step 208, at least one third text is recalled from the knowledge base based on the vector of the first text.

In the embodiment of the present specification, text vectors have been extracted from the texts stored in the knowledge base and vector indexes are established in advance, so in step 208, a vector search may be performed according to the determined vector of the first text by using the vector indexes to find a plurality of texts whose text vectors match the text vector of the first text from the texts stored in the knowledge base, score the plurality of searched texts according to a preset scoring policy, and return the plurality of texts with the highest score as a recall result (i.e., a third text) to the client 102. The number of the third texts to be returned may be set in advance. There are many methods for vector search using vector indexes, for example, it can be implemented using HNSW algorithm or using Elasticsearch search engine, etc. The embodiments of the present description do not limit the method for searching for a vector by using a vector index, which is specifically used by the server 104. In the embodiment of the present disclosure, the text recall manner described in the above step 206-208 is simply referred to as vector recall.

It should be noted that the

steps

204 and 206 and 208 may be two processes executed in parallel, and the order of the step numbers does not represent the order of execution.

In step 210, after obtaining the at least one second text and the at least one third text, the server 104 may further fuse the at least one second text and the at least one third text to obtain a text retrieval result.

In the embodiments of the present specification, the above fusion may also be achieved by various methods.

In some embodiments of the present disclosure, the server 104 may merge, that is, perform deduplication processing on the at least one second text and the at least one third text to obtain the text retrieval result.

In other embodiments of the present description, the server 104 may implement the above fusion by using a fusion model, and a specific fusion process may be as shown in fig. 5, including:

in step 502, the at least one second text and the at least one third text are respectively input into the trained text vector model, and a text vector of the at least one second text and a text vector of the at least one third text are determined.

In an embodiment of the present specification, the text vector model may be the text vector model described in step 206.

In step 504, the text vectors of the at least one second text are averaged to obtain an average vector of the second text, and the average vector of the second text is linearly transformed to obtain an average weight of the second text.

In step 506, the text vectors of the at least one third text are averaged to obtain an average vector of the third text, and the average vector of the third text is linearly transformed to obtain an average weight of the third text.

In step 508, in response to that the average weight of the second text is greater than or equal to the average weight of the third text, the at least one second text is taken as a text retrieval result; and in response to the average weight value of the second text being smaller than the average weight value of the third text, taking the at least one third text as a text retrieval result.

In an embodiment of the present specification, the operation of linearly transforming the average vector of the second text and the average vector of the third text may be implemented by a trained linear transformation model. The aim of training the linear transformation model is to make the average weight of a group of texts containing the text which is most matched with the first text be greater than the average weight of other groups of texts, namely, the average weight of the group of texts containing the text of the "best answer" be the greatest, so that a group of texts with the greater average weight can be selected as the retrieval result to be output according to the average weights of the second text and the third text by the method, and the group of texts can be ensured to contain the text which is most matched with the first text, namely, the "best answer" for text retrieval, so that the user can be ensured to obtain the best text retrieval result.

Therefore, the embodiment of the present specification can combine the word recall text retrieval mode with the vector recall text retrieval mode, and can retrieve word level related texts and semantic level related texts, that is, semantic level text recalls are added on the basis of word level text recalls, so that the text retrieval result is more comprehensive, and the text retrieval recall rate is improved.

Further, as described above, the fusion model can ensure that the search result includes the text that most matches the first text, that is, the "best answer" for text search.

The training method of each model is described in detail below with reference to specific examples.

Fig. 6 shows a flow of a method for training a word weight model according to an embodiment of the present disclosure. As shown in fig. 6, the method may include:

in step 602, training data is obtained, where the training data includes a plurality of training texts and a known output corresponding to each training text, where each training text includes at least one second word; the known output is a word weight of the at least one second word.

In an embodiment of the present specification, each of the above-mentioned second words included in the training texts is labeled with an importance label, and the importance label identifies the importance degree of the word. At this time, the known outputs may be specifically: the importance of the at least one second term.

The following

steps

604 and 610 are respectively performed for each training text:

inputting at least one second word included in the training text into the encoder, and generating a word vector of the at least one second word according to the current value of the parameter of the encoder in step 604;

in step 606, inputting the word vector of the at least one second word into the linear transformation layer, and generating a word weight of the at least one second word according to the current value of the parameter of the linear transformation layer as a prediction output of the training text;

at step 608, determining a gradient based on an error between the predicted output and the known output of the training text; and

at step 610, the gradient is backpropagated to the encoder and the linear transform layer to jointly adjust current values of parameters of the encoder and the linear transform layer.

As mentioned above, the known outputs may be specifically: the importance of the at least one second term, therefore, when the predicted output does not match the known output, the result of the training is regarded as a "penalty", and the current value of the model parameter is adjusted accordingly; and when the predicted output is consistent with the known output, the training result is regarded as one time of reward, and the current value of the model parameter is adjusted according to the reward.

The training process may be ended when the training reaches a predetermined number of times or the model converges.

Fig. 7 shows an internal structure of a word weight model according to an embodiment of the present specification. As shown in fig. 7, the word weight model may include:

an input layer 702 for receiving at least one word and its corresponding known output. For example, in fig. 6, the at least one term includes: the words "my", "money", "lost", "this", "how" and "woollen" are six words, wherein "money" and "lost" are words labeled as important words with higher word weights. The words labeled as important are indicated in fig. 6 using shaded boxes.

The encoder 704 is configured to encode the at least one word and output a word vector of the at least one word. For example, word vector 1 through word vector 6 are shown in FIG. 7.

And a linear transformation layer 706, configured to perform linear transformation on the word vector of the at least one word to obtain a word weight of the at least one word, and use the word weight as a prediction output.

For example, the word weight 1 to the word weight 6 shown in fig. 7.

A comparison layer 708 for determining a gradient based on an error between the predicted output and the known output; and propagates the gradient back to the encoder 704 and the linear transform layer 706 to adjust the current values of the parameters of the encoder 704 and the linear transform layer 706. For example, in fig. 7, the word weights for the known outputs as the words "money" and "lost" should be greater than "my", "the", "what" and "woollen", so the comparison layer 708 compares the predicted output with the known output and jointly adjusts the encoder 704 and the linear transformation layer 706 based on the error between the two.

As described above, the encoder 704 may be implemented by at least one of a BERT model, a CNN model, and an LSTM model. The linear transform layer 706 may be a 1 × N or N × 1 coefficient matrix, where N is a dimension of the word vector. Or the linear transform layer 706 may specifically include a 1 × N or N × 1 coefficient matrix and a normalization unit.

It can be seen that after the training is completed, the word weight output by the word weight model can accurately represent the importance degree of the word.

Fig. 8 shows a method flow for training a text vector model according to an embodiment of the present disclosure. As shown in fig. 8, the method may include:

in step 802, second training data is obtained, the second training data including a plurality of sets of training text pairs and known outputs corresponding to each of the training text pairs.

In an embodiment of the present specification, each of the training text pairs described above contains a matching degree label of the training text pair. At this time, the known outputs may be specifically: the matching degree of the training text pair output by the model and the matching degree label of the training text pair should be consistent. That is, when the matching degree label identifies the training text pair as matching, the matching degree of the training text pair output by the model should be greater than a predetermined threshold value; and when the matching degree label identifies the training text pair as not matching, the matching degree of the training text pair output by the model should be less than or equal to a preset threshold value.

For each training text pair, the following

steps

804 and 810 are respectively performed:

in step 804, a first training text and a second training text in the training text pair are respectively input into a text vector model, and a first training text vector corresponding to the first training text and a second training text vector corresponding to the second training text are generated according to current values of parameters of the text vector model;

in step 806, determining the correlation degree of the first training text and the second training text according to the first training text vector and the second training text vector, and outputting the correlation degree as the prediction of the training text pair;

determining a gradient based on an error between the predicted output and the known output of the training text pair, step 808; and

at step 810, the gradient is propagated back to the text vector model to adjust the current values of the parameters of the text vector model.

As mentioned above, the known outputs may be specifically: the matching degree of the training text pair output by the model is consistent with the matching degree label of the training text pair, therefore, when the predicted output is inconsistent with the known output, the result of the current training is regarded as a punishment, and the current value of the model parameter is adjusted according to the punishment; and when the predicted output is consistent with the known output, the training result is regarded as one time of reward, and the current value of the model parameter is adjusted according to the reward.

Fig. 9 shows an internal structure of a text vector model according to an embodiment of the present specification. As shown in fig. 9, the text vector model may include:

a second input layer 902 for receiving a text pair and its corresponding known output. For example, in FIG. 9, the first text in the text pair is "what did my money get lost"; the second text is "what she lost her bag": and, the matching degree label of the text pair is not matching.

And a second encoder 904, configured to encode texts in the text pairs respectively, and output two text vectors. For example, text vector 1 and text vector 2 are shown in fig. 9.

And a matching degree calculation layer 906, configured to determine a matching degree of the two text vectors, and output the matching degree as a prediction.

A second comparison layer 908 for determining a gradient based on an error between the predicted output and the known output; and back-propagating the gradient to the encoder to adjust a current value of a parameter of the encoder.

For example, in fig. 9, the known output is a text that does not match "what my money was lost" and "what she had lost her bag", and therefore, the second comparison layer 908 compares the predicted output with the known output and jointly adjusts the second encoder 904 based on the error between the two.

As described above, the second encoder 904 may be implemented by at least one of a BERT model, a CNN model, and an LSTM model.

It can be seen that after training is completed, the text vector model can encode a text into a text vector, and the text vector can accurately express the semantics of the text.

FIG. 10 illustrates another method flow for training a text vector model as described herein. As shown in fig. 10, the method may include:

in step 1002, second training data is obtained, the second training data including a plurality of sets of training text pairs and known outputs corresponding to each of the training text pairs;

for each training text pair, the following steps 1004-1010 are performed:

in step 1004, a first training text vector corresponding to the first training text and a first training text vector corresponding to a second training text are generated according to current values of parameters of a BERT model by respectively inputting the first training text and the second training text in the training text pair into the BERT model;

in step 1006, determining a correlation degree of a text vector of the first training text and a text vector of a second training text according to the first training text vector and the second training text vector, and outputting the correlation degree as a prediction output of the training text pair;

at step 1008, determining a gradient based on an error between the predicted output and the known output of the training text pair;

at step 1010, the gradients are backpropagated to the BERT model to adjust current values of parameters of the text vector model.

The specific implementation of steps 1002-1010 can refer to steps 802-810.

At step 1012, after the training of the BERT model is completed, a text vector model is trained using a model distillation method according to the trained BERT model.

In the embodiments of the present specification, the text vector model may be implemented by a relatively simple machine learning model such as a CNN model or an LSTM model.

The reason for using model distillation is that the BERT model has a complicated structure, consumes relatively large amount of resources in the using process and takes relatively long time for calculation. In order to quickly implement vectorization of texts in application, only the BERT model is used for model training, and after the BERT model training is completed, the BERT model which is relatively simple in structure and faster in operation speed is used for quickly learning the BERT model which is trained through model distillation. And in practical application, the CNN model or the LSTM model trained in a model distillation mode is used for coding the input text to obtain a text vector, so that the effect of a BERT model can be achieved. Specifically, in the model distillation process, the same text may be input to the CNN model or LSTM model and the trained BERT model, and then, the CNN model or LSTM model parameters may be adjusted by using the error between the vector output by the CNN model or LSTM model and the vector output by the BERT model through a distillation learning manner, thereby completing the training of the CNN model or LSTM model.

FIG. 11 illustrates an internal structure of a text vector model according to an embodiment of the present disclosure. As shown in fig. 11, in addition to the second input layer 902, the second encoder 904, the matching degree calculation layer 906, and the second comparison layer 908, the text vector model may further include: a text vector model 910 for learning the encoder 904 by model distillation. As mentioned above, the second encoder 904 may be a BERT model, and the text vector model 910 may be a CNN model or an LSTM model.

In an embodiment of the present specification, in order to train the text vector model 910 by model distillation, the same text, for example, "what my money is lost" may be input into the text vector model 910 and the trained BERT model; then, the parameters of the text vector model 910 are adjusted according to the error between the text vector (e.g., text vector 3 in fig. 11) output by the text vector model 910 and the text vector (e.g., text vector 2 in fig. 11) output by the BERT model by distillation learning, thereby completing the training of the text vector model 910.

FIG. 12 illustrates a process flow of a method of training a fusion model as described herein. As shown in fig. 12, the method may include:

at step 1202, third training data is obtained, where the third training data includes a plurality of pairs of training text sets and known outputs corresponding to each pair of training text sets;

in an embodiment of the present specification, each of the training text sets includes at least two training text sets: the first training text group and the second training text group, and one text in one of the training text groups is labeled with a label. At this time, the known outputs may be specifically: the average weight of the training text set containing the labeled text output by the model should be greater than the average weight of the other training text sets.

For each pair of training text groups, the following steps 1204-1214 are performed respectively:

in step 1204, inputting a first training text group and a second training text group into the trained text vector model respectively, and determining a text vector of each training text in the first training text group and a text vector of each training text in the second training text group;

in step 1206, averaging the text vectors of each training text in the first training text group to obtain an average vector of the first training text group;

in step 1208, averaging the text vectors of each training text in the second training text group to obtain an average vector of the second training text group;

in step 1210, inputting the average vector of the first training text group and the average vector of the second training text group into a linear transformation model, respectively, and generating the average weight of the first training text group and the average weight of the second training text group according to the current values of the parameters of the linear transformation model as the predicted outputs of the pair of training text groups;

at step 1212, determining a gradient based on an error between the predicted output and the known output;

at step 1214, the gradient is backpropagated to the linear transformation model to adjust current values of parameters of the linear transformation model.

As mentioned above, the known outputs may be specifically: the average weight of the training text group containing the label text output by the model is larger than that of other training text groups, therefore, when the predicted output is inconsistent with the known output, the result of the current training is regarded as a punishment, and the current value of the model parameter is adjusted according to the punishment; and when the predicted output is consistent with the known output, the training result is regarded as one time of reward, and the current value of the model parameter is adjusted according to the reward.

Fig. 13 shows an internal structure of a fusion model according to an embodiment of the present specification. As shown in fig. 13, the fusion model may include:

a third input layer 1302 for receiving two text groups and their corresponding known outputs. For example, in fig. 13, the first text group in the text groups is m texts recalled by words; the second text group is n texts recalled by the vectors, and one text in the first text group is labeled with a label.

And the text vector model 1304 is used for respectively encoding the texts in the two text groups and outputting two groups of text vectors.

The average layer 1306 is configured to average the two groups of text vectors to obtain a first group of text average text vectors and a second group of text average text vectors.

The second linear coding model 1308 is configured to perform linear transformation on the determined first group of text average text vectors and the determined second group of text average text vectors to obtain average weights of the two text groups, and the average weights are used as prediction outputs.

A third comparison layer 1310 for determining a gradient based on an error between the predicted output and the known output; and propagates the gradient back to the current values of the parameters of the second linear coding model 1308. For example, in FIG. 13, the known outputs are: the average weight of the first text group is greater than the average weight of the second text group, so the third comparison layer 1310 compares the predicted output with the known output and jointly adjusts the second linear coding model 1308 according to the error between the predicted output and the known output.

It can be seen that after training is completed, the above described fusion model selects a set of recall results that includes the "best answer" from the sets of recall results.

Corresponding to the above method, an embodiment of the present specification further provides a text search apparatus, and an internal structure of the apparatus is shown in fig. 14, and the text search apparatus may include:

a word segmentation module 1402, configured to perform word segmentation on the received first text to obtain at least one word;

a term recall module 1404 for recalling at least one second text from the knowledge base in accordance with the at least one term;

a text vector generation module 1406, configured to input the at least one word into a trained text vector model to obtain a text vector of the first text;

a vector recall module 1408 for recalling at least one third text from the knowledge base according to the vector of the first text; and

and a fusion module 1410, configured to fuse the at least one second text and the at least one third text to obtain a text retrieval result.

In some embodiments of the present specification, the word recall module includes:

In an embodiment of the present specification, the word weight model includes:

an encoder and a linear transform layer; wherein,

In other embodiments of the present specification, the term recall module includes:

In some embodiments of the present description, the fusion module includes:

In other embodiments of the present disclosure, the fusion module includes:

Further, in an embodiment of the present specification, the text search apparatus described above may be regarded as one electronic device, and therefore, the text search apparatus may include: memory 1400, processor 1200, input/output interface 1600, communication interface 1800, and bus 2000. Wherein the processor 1200, the memory 1400, the input/output interface 1600, and the communication interface 1800 are communicatively coupled to each other within the device via the bus 2000.

The Memory 1300 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1400 may store an operating system and other application programs, and may also store various modules of the server provided in the embodiments of the present specification, such as the above-mentioned word segmentation module 1402, the word recall module 1404, the text vector generation module 1406, the vector recall module 1408, and the fusion module 1410, and when the technical solution provided in the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1400 and invoked by the processor 1200 for execution.

The processor 1200 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.

The input/output interface 1600 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1800 is used for connecting a communication module (not shown in the drawings) to enable the device to interact with other devices in a communication manner. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).

Bus 2000 includes a pathway to transfer information between various components of the device, such as processor 1200, memory 1400, input/output interface 1600, and communication interface 1800.

It should be noted that although the above-mentioned device only shows the processor 1200, the memory 1400, the input/output interface 1600, the communication interface 1800 and the bus 2000, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the electronic device embodiment and the computer storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of this description, features in the above embodiments or in different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of this description as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the description. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the description, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the description is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the specification, it should be apparent to one skilled in the art that the specification can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present description has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments of the present description are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. A method of text retrieval, the method comprising:

performing word segmentation on the received first text to obtain at least one word;

recalling at least one second text from a knowledge base according to the at least one word;

inputting the at least one word into a trained text vector model to obtain a text vector of the first text;

recalling at least one third text from the knowledge base according to the vector of the first text; and

and fusing the at least one second text and the at least one third text to obtain a text retrieval result.

2. The method of claim 1, wherein recalling at least one second text from a knowledge base in accordance with the at least one term comprises:

determining word weights of the at least one word respectively;

determining at least one keyword from the at least one word according to the word weight of the at least one word; and

recalling at least one second text from the knowledge base according to the at least one keyword.

3. The method of claim 2, wherein determining the word weight of the at least one word, respectively, comprises: and respectively inputting the at least one word into the trained word weight model to obtain the word weight of the at least one word.

4. The method of claim 3, the word weight model comprising:

an encoder and a linear transform layer; wherein,

5. The method of claim 2, wherein determining the word weight of the at least one word, respectively, comprises: determining a word weight of the at least one word according to a word frequency-inverse text frequency index, TF-IDF, algorithm.

6. The method of claim 1, the fusing the at least one second text and the at least one third text comprising:

and merging the at least one second text and the at least one third text to obtain the text retrieval result.

7. The method of claim 1, the fusing the at least one second text and the at least one third text comprising:

inputting the at least one second text and the at least one third text into the trained text vector model respectively, and determining a text vector of the at least one second text and a text vector of the at least one third text;

averaging the text vectors of the at least one second text to obtain an average vector of the second text, and performing linear transformation on the average vector of the second text to obtain an average weight of the second text;

averaging the text vectors of the at least one third text to obtain an average vector of the third text, and performing linear transformation on the average vector of the third text to obtain an average weight of the third text;

determining the at least one second text as the text retrieval result in response to the average weight of the second text being greater than or equal to the average weight of the third text; and

determining the at least one third text as the text retrieval result in response to the average weight value of the second text being less than the average weight of the third text.

8. A method of training a word weight model, the method comprising:

for each of the training texts,

inputting the at least one second word into an encoder, and generating a word vector of the at least one second word according to the current value of the parameter of the encoder;

inputting the word vector of the at least one second word into a linear transformation layer, generating a word weight of the at least one second word according to the current value of the parameter of the linear transformation layer, and taking the word weight of the at least one second word as the prediction output of the training text;

determining a gradient based on an error between a predicted output and a known output of the training text;

back-propagating the gradient to the encoder and the linear transform layer to jointly adjust current values of parameters of the encoder and the linear transform layer.

9. A method of training a text vector model, the method comprising:

for each of the pairs of training texts,

respectively inputting a first training text and a second training text of the training text pair into the text vector model, and generating a first training text vector corresponding to the first training text and a second training text vector corresponding to the second training text according to the current values of the parameters of the text vector model;

determining the matching degree of the first training text and the second training text according to the first training text vector and the second training text vector, and taking the matching degree as the prediction output of the training text pair;

determining a gradient based on an error between a predicted output and a known output of the training text pair;

propagating the gradient back to the text vector model to adjust current values of parameters of the text vector model.

10. A method of training a text vector model, the method comprising:

for each of the pairs of training texts,

respectively inputting a first training text and a second training text in the training text pair into a BERT model, and generating a first training text vector corresponding to the first training text and a first training text vector corresponding to the second training text according to the current values of parameters of the BERT model;

back-propagating the gradient to the BERT model to adjust current values of parameters of the BERT model; and

11. The method of claim 10, the text vector model comprising a CNN model or an LSTM model.

12. A text retrieval device, the device comprising:

13. The device of claim 12, the term recall module comprising:

14. The apparatus of claim 13, the word weight model comprising:

an encoder and a linear transform layer; wherein,

15. The device of claim 12, the term recall module comprising:

16. The apparatus of claim 12, the fusion module comprising:

17. The apparatus of claim 12, the fusion module comprising:

18. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of any one of claims 1 to 11.

19. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 11.