WO2023168814A1

WO2023168814A1 - Sentence vector generation method and apparatus, computer device and storage medium

Info

Publication number: WO2023168814A1
Application number: PCT/CN2022/089817
Authority: WO
Inventors: 陈浩
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-03-09
Filing date: 2022-04-28
Publication date: 2023-09-14
Also published as: CN114444471A

Abstract

The present application discloses a sentence vector generation method and apparatus, a computer device, and a storage medium, which relate to the technical field of artificial intelligence, and can improve the accuracy of sentence vector generation. The method comprises: performing semantic segmentation on obtained initial sentence text to obtain segmented sentence text; and utilizing a pre-constructed sentence vector generation model to obtain a vector representation of the sentence text by means of encoding processing used to predict the context of the sentence text, the sentence vector generation model being an encoding layer of a trained sequence-to-sequence model. The present application is suitable for book recommendation on the basis of sentence vectors of book texts.

Description

Sentence vector generation method, device, computer equipment and storage medium

This application claims priority to the Chinese patent application filed with the China Patent Office on March 9, 2022, with application number 202210232057.9 and the application title "Sentence Vector Generation Method, Device, Computer Equipment and Storage Medium", the entire content of which is incorporated by reference. incorporated in the application.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to sentence vector generation methods, devices, computer equipment and storage media.

Background technique

Natural language processing is an important direction in the field of computer science and artificial intelligence. Sentence embedding, as a vector representation of text data, is widely used in many application scenarios of natural language processing. By mapping text data into a quantifiable vector space, we can obtain sentence vector representations that represent text data features, semantics, grammar and other information, and then use vector clustering, classification and other methods to obtain the relationship between text sentences, which can realize the sentence vector in Application in actual scenarios.

Existing solutions for sentence vector construction mainly include construction methods based on word vector average and construction methods based on contrastive learning. Construction methods based on word vector average such as word2vec, glove, bert, etc.; construction methods based on contrastive learning, Construct positive samples for contrastive learning by using different methods, such as dropout, replacement, deletion, back-translation, etc. The inventor realized that the shortcomings of the existing solutions are: 1) The construction method based on the average word vector, which destroys the dependence between words in the sentence, and the accuracy of feature extraction is low; 2) The method based on contrastive learning Construction method. Although there are many ways to obtain positive samples, the similarity between the randomly selected negative samples and the original sentences is low, which leads to low difficulty in training the model. The transfer ability of the model in actual tasks is insufficient, which in turn leads to the generated Sentence vectors have lower accuracy.

Contents of the invention

In view of this, this application provides a sentence vector generation method, device, computer equipment and storage medium. The main purpose is to solve the problem in the existing technology that the construction method based on the word vector average has low accuracy in sentence feature extraction, and based on The construction method of contrastive learning has a technical problem of insufficient transfer ability of the model in actual tasks, resulting in low accuracy of the generated sentence vectors.

According to one aspect of the present application, a sentence vector generation method is provided, which method includes:

Perform semantic segmentation on the obtained initial sentence text to obtain the segmented sentence text;

Using a pre-built sentence vector generation model, the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;

Wherein, the trained sequence-to-sequence model is obtained through the following steps:

Using the initial sequence-to-sequence model, the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;

Based on the above predicted sentences and the following predicted sentences, the trained sequence-to-sequence model is obtained.

According to another aspect of the present application, a sentence vector generation device is provided, which device includes:

The model training module can be used to use the initial sequence-to-sequence model to encode and contextually decode the current sentence in the sequence from the context sentences in the constructed sentence sample set to obtain the above prediction sentence and the below prediction of the current sentence. sentence; and, based on the above predicted sentence and the following predicted sentence, a trained sequence-to-sequence model is obtained;

The preprocessing module is used to perform semantic segmentation on the obtained initial sentence text and obtain the segmented sentence text;

An encoding module, configured to utilize a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing used to predict the context of the sentence text. The sentence vector generation model is a trained sequence-to-sequence model. encoding layer.

According to another aspect of the present application, a storage medium is provided, on which a computer program is stored. When the program is executed by a processor, the above sentence vector generation method is implemented, including:

According to another aspect of the present application, a computer device is provided, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor. When the processor executes the program, the above sentence vector is realized. Generation methods, including:

With the above technical solution, sequence-to-sequence model training is performed on sequences based on contextual sentences, and the encoding layer of the trained sequence-to-sequence model is used to generate sentence vectors, which can effectively improve the accuracy of sentence vector generation on the basis of improving the difficulty of model training. It ensures the integrity of the semantic information and grammatical information of the generated sentence vectors, thereby effectively avoiding the existing construction method based on the average word vector, destroying the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction. As well as the technical problems of the construction method based on contrastive learning, the training difficulty of the model is low, the transfer ability of the model in actual tasks is insufficient, and the accuracy of the generated sentence vectors is low.

The above description is only an overview of the technical solutions of the present application. In order to have a clearer understanding of the technical means of the present application, they can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present application more obvious and understandable. , the specific implementation methods of the present application are specifically listed below.

Description of the drawings

The drawings described here are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation of the present application. In the attached picture:

Figure 1 shows a schematic flowchart of a sentence vector generation method provided by an embodiment of the present application;

Figure 2 shows a schematic flowchart of another sentence vector generation method provided by an embodiment of the present application;

Figure 3 shows a schematic diagram of the initial sequence-to-sequence model architecture provided by the embodiment of the present application;

Figure 4 shows a schematic structural diagram of a sentence vector generation device provided by an embodiment of the present application;

Figure 5 shows a schematic structural diagram of another sentence vector generation device provided by an embodiment of the present application.

Detailed ways

The present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be noted that, as long as there is no conflict, the embodiments and features in the embodiments of this application can be combined with each other.

The embodiments of this application can obtain and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI: Artificial Intelligence) is the theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

In view of the fact that the construction method based on the average word vector in the existing technology has low accuracy in sentence feature extraction, and the construction method based on contrastive learning has insufficient transfer ability of the model in actual tasks, and the accuracy of the generated sentence vectors is low To solve the technical problem, this embodiment provides a sentence vector generation method, as shown in Figure 1. This method is explained by taking the method applied to computer equipment such as servers as an example. The server can be an independent server or a cloud-provided server. Services, cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (CDN: Content Delivery Network), and big data and artificial intelligence platforms and other basics Cloud server for cloud computing services. The above method includes the following steps:

Step 101: Perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.

In this embodiment, taking the book recommendation scenario as an example, it is suitable for recommending other similar books based on the obtained book text content. Specifically, when a book recommendation request is received, according to the book title in the book recommendation request, obtain and For the book text content corresponding to the book title, the book text content is segmented based on Chinese punctuation, and through text segmentation, multiple sentence texts are obtained for input into the sentence vector generation model. According to the needs of actual application scenarios, the book text content can be book abstract text, book introduction text, etc., which are not specifically limited here.

Step 102: Use a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text. The sentence vector generation model is the encoding of a trained sequence-to-sequence model. layer; wherein, the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model, the context sentences in the constructed sentence sample set are encoded and context decoded on the current sentence in the sequence, and the result is The upper predicted sentence and the lower predicted sentence of the current sentence are used; according to the upper predicted sentence and the lower predicted sentence, a trained sequence-to-sequence model is obtained.

In this embodiment, the initial sequence-to-sequence model is trained based on the constructed sentence sample set including the context sentence pair sequence, where the context sentence pair sequence includes the current sentence and the context sentence corresponding to the current sentence, and the current sentence is input into the initial sequence to the sequence The coding layer of the model performs encoding processing to obtain a vector representation containing the context feature information of the current sentence. The vector representation containing the context feature information of the current sentence is input into the initial sequence into the two decoding layers set up in parallel in the sequence model. The current sentence is obtained through decoding processing. The upper prediction sentence and the lower prediction sentence of . It can be seen that the coding layer of the trained sequence-to-sequence model has the coding ability to accurately predict the current sentence context and can retain the integrity of the semantic information and grammatical information of the current sentence context. Therefore, the vector representation output on this basis can contain the current sentence. Complete contextual feature information to ensure the accuracy of subsequent book recommendations.

Among them, the context sentence pair sequence constructed based on the current sentence and its context sentences is used as the input data of the initial sequence-to-sequence model, which can retain the interdependence and mutual influence between words without destroying the overall structure of the text data, thereby ensuring The model can learn the complete semantic information and grammatical information contained in the sentence text, improving the accuracy of the model in extracting contextual sentence features.

For this embodiment, the obtained initial sentence text can be semantically segmented according to the above solution to obtain the segmented sentence text, and a pre-built sentence vector generation model can be used to predict the context of the sentence text through coding processing. Obtain the vector representation of the sentence text, and the sentence vector generation model is the encoding layer of the trained sequence-to-sequence model; wherein the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model , perform coding processing and context decoding processing on the context sentences in the constructed sentence sample set to the current sentence in the sequence, and obtain the upper prediction sentence and the lower prediction sentence of the current sentence; according to the upper prediction sentence and the lower prediction sentence, we get Trained sequence-to-sequence model. Compared with existing sentence vector generation solutions such as the construction based on word vector average and the construction based on contrastive learning, this embodiment uses context sentences to perform sequence-to-sequence model training on sequences, and uses the coding layer of the trained sequence-to-sequence model The generated sentence vector of the sentence text can ensure the integrity of the semantic information and grammatical information of the sentence text, thereby effectively improving the accuracy of sentence vector generation.

Further, as a refinement and expansion of the specific implementation of the above embodiment, in order to completely explain the specific implementation process of this embodiment, another sentence vector generation method is provided, as shown in Figure 2. This method includes:

Step 201: Use the initial sequence-to-sequence model to perform encoding and context decoding processing on the current sentence in the sequence of the context sentences in the constructed sentence sample set to obtain the upper predicted sentence and the lower predicted sentence of the current sentence.

Wherein, the context sentence pair sequence specifically includes: the current sentence used to be input to the coding layer of the initial sequence to sequence model for context sentence prediction; and the above sentence used to train the output result of the initial sequence to sequence model. The target sentence and the target sentence below, the output result is the predicted sentence above and the predicted sentence below output during the model training process.

In order to illustrate the specific implementation of step 201, as a preferred embodiment, step 201 may specifically include: using a word segmentation tool to perform word segmentation processing according to the sequence of context sentence pairs to obtain a sequence of context sentence pairs after word segmentation; The context sentence of the current sentence in the sequence is used to obtain the sentence embedding vector of the current sentence using the encoding layer of the initial sequence to sequence model; according to the sentence embedding vector of the current sentence, the initial sequence to sequence model is used Two decoding layers are set up in parallel to obtain the upper prediction sentence and the lower prediction sentence respectively; wherein, the two decoding layers refer to the first decoding layer used to predict the upper part, and the second decoding layer used to predict the lower part. layer.

In order to illustrate the specific implementation of step 201, as another preferred embodiment, the first decoding layer used to predict the upper part is a first GRU model, and the second decoding layer used to predict the lower part is a second GRU. model, the step of obtaining the above predicted sentence and the below predicted sentence respectively according to the sentence embedding vector of the current sentence, using the two decoding layers set in parallel in the initial sequence to sequence model, specifically includes: The sentence embedding vector of the current sentence is used as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and the above predicted sentence of the current sentence is obtained through decoding processing; the sentence embedding vector of the current sentence is used as the input data of the first GRU model. The input data of the two GRU models are decoded to obtain the following predicted sentences of the current sentence.

In implementation, before the step of using the initial sequence-to-sequence model to obtain the above predicted sentence and the following predicted sentence of the current sentence based on the current sentence in the context sentence pair sequence, it also includes: constructing a sentence sample set, and the sentence sample set includes the context. Sentence pair sequence. Specific steps include:

1) Randomly select any book text, segment the selected book text based on Chinese punctuation, and obtain the book text D, D=[S ₁ , S ₂ , S ₃ , S ₄ , S ₅ …S _i ,…,S _n ], where S _i represents the i-th sentence in book text D, and n represents the number of sentences obtained by sentence segmentation of book text D. For example, the book text collection includes 3727 books and all the text content of each book. Randomly select any book text from it, and perform sentence segmentation on all the text content of the selected book text.

2) Construct the context sentence pair sequence sentence pairs based on the book text D, that is, construct the context sentence pair sequence by traversing each sentence in the book text D to obtain the sentence sample set G. Among them, the sequence of context sentence pairs is expressed as (S ₁ ,S ₂ ,S ₃ ), (S ₂ ,S ₃ ,S ₄ ), (S ₃ ,S ₄ ,S ₅ ), (S _i-1 ,S _i , S _i+1 ),..., (S _n-2 ,S _n-1 ,S _n ), where S _i represents the current sentence, S _i-1 represents the above target sentence adjacent to S _i , S _i+1 Represents the following target sentence adjacent to _Si .

In the implementation, the encoding layer Encoder of the initial sequence-to-sequence model is used to output the sentence embedding vector sentence embedding h _s of the current sentence, and the first decoding layer pre-Decoder for predicting the above sentence sequence and the first decoding layer for predicting the following sentence sequence are simultaneously input The second decoding layer next-Decoder uses the first decoding layer pre-Decoder and the second decoding layer next-Decoder to respectively obtain the upper prediction sentence of the current sentence and the lower prediction sentence of the current sentence. As shown in Figure 3, specific steps include:

1) Use the word segmentation tool (Harbin Institute of Technology LTP model) to segment the sentences in each context sentence sequence of the sentence sample set G. The sentence after segmentation is expressed as _Si [t ₁ ,t ₂ ,...,t _p , …,t _l ], where t _p represents the p-th token in Si _i , and l represents the number of tokens obtained after S _i is segmented.

2) Build an initial sequence-to-sequence model based on the encoder-decoder model architecture. The initial sequence-to-sequence model includes one encoding layer and two decoding layers. The basic models of the encoding layer and the decoding layer are both gated recurrent units (GRU: Gate Recurrent Unit). ).

3) Use the sentence sample set G after word segmentation processing as the input of the initial sequence to the sequence model, input the initial sequence of the current sentence in each sentence pair sequence to the encoding layer Encoder of the sequence model, and obtain the sentence embedding of the current sentence through encoding processing. Vector sentence embedding h _s , using the first decoding layer pre-Decoder and the second decoding layer

The next-Encoder decodes the sentence embedding vector sentence embedding h _s synchronously, and obtains the upper prediction sentence of the current sentence and the lower prediction sentence corresponding to the current sentence. Specifically include:

① Use the current sentence in the context sentence pair sequence as the input of the initial sequence to the sequence model encoding layer Encoder. Take (S _i-1 , S _i , S _i+1 ) as an example, and use (S i-1 , S i-1 , S _i+1 ) after word segmentation. The _sentence Si in _Si , Si ₊₁ ) = [t ₁ , t ₂ ,...,t _p ,..., t _l ] is input to the encoding layer Encoder, and the sentence embedding vector sentence embedding h _s of _Si is obtained through encoding processing. .

② Use the sentence embedding vector sentence embedding h _s as the input of the first decoding layer pre-Decoder (above decoding), and obtain the above predicted sentence Y _i-1 corresponding to the current sentence through decoding processing. Among them, the sentence embedding vector sentence embedding h _s of the current sentence Si is used to predict the above predicted sentence Y _i-1 corresponding to the current sentence. Since upward prediction does not conform to the characteristics of natural language, the training difficulty of _the first decoding layer is greater than that of the second The decoding layer next-Decoder (below decoding) improves the GRU model architecture to improve the accuracy of the above prediction while ensuring training efficiency and preventing gradient disappearance. Specifically, by adding the embedding vector sentence embedding h _s of the current sentence to the input terminals of the update gate, reset gate and candidate memory unit in the first decoding layer and setting the corresponding parameters, to ensure the token-by-token generation process , the GRU model at each moment can combine the sentence embedding vector sentence embedding h _s of the current sentence S _i . The specific formula is as follows:

z _t =σ(W _z x _t +U _z h _t-1 +V _z h _s )

r _t =σ(W _r x _t +U _r h _t-1 +V _r h _s )

Among them, z _t represents the update gate of the GRU model, W _z , U _z are the update gate parameters of the original GRU model, x _t represents the input vector at the current time t, h _t-1 represents the previous moment, that is, the input vector at time t-1 The vector at the current moment t, V _z represents the parameters set for the sentence embedding vector sentence embedding h _s . In the same way, the reset gate r _t and candidate memory units of the GRU model

They all incorporate sentence embedding h _s , W _r , U _r , V _r represents the parameters of the reset gate, tanh represents the activation function, W _k , U _k , V _k represents the parameters of the candidate memory unit, h _t represents the current moment t Output vector, σ represents the fully connected layer with activation function, ☉ represents the multiplication operation of the corresponding elements of the vector.

③Synchronize with the first decoding layer, input the sentence embedding vector sentence embedding h _s into the second decoding layer

next-Encoder, through decoding processing, obtains the following predicted sentence Y _i+1 corresponding to the current sentence. Among them, predicting the following sentences based on the current sentence is in line with the top-down characteristics of natural language. Therefore, the second decoding layer next-Encoder uses the existing GRU model, and the sentence embedding vector sentence embedding h _s is only used as the initial vector of the second decoding layer.

It can be seen that predicting the previous sentence of the current sentence based on the encoder-decoder model framework breaks the top-down rule of natural language, increases the difficulty of model training, and enables the model to be fully trained, thereby outputting complete semantic signals and The sentence vector representation of grammatical information, furthermore, by improving the update gate, reset gate and candidate memory unit of the GRU model, can effectively ensure the training efficiency of the model while improving the difficulty of model training.

Step 202: Use the target loss function to train the initial sequence-to-sequence model based on the upper predicted sentence and the lower predicted sentence of the current sentence to obtain a trained sequence-to-sequence model. Wherein, the target loss function is determined based on the sum of the first loss function and the second loss function, and the first loss function in the target loss function is set based on the first decoding layer used to predict the above, The second loss function in the target loss function is set based on the second decoding layer used to predict the following.

In the implementation, based on the target sentence Si _-1 above and the target sentence Si ₊₁ below, as well as the predicted sentence Yi _-1 above and the predicted sentence Yi ₊₁ below, the target loss function is used to train the initialized sequence-to-sequence model. network parameters until the initialized sequence-to-sequence model converges, and the trained sequence-to-sequence model is obtained. Specifically, the cross-entropy loss function is used as the basic loss function, and the specific formula is:

Among them, CE represents the cross entropy loss function, S represents the current sentence, Y represents the predicted sentence generated by the decoding layer Decoder, l represents the number of tokens determined after segmentation of the current sentence S, and t _j represents the jth token obtained by segmenting the current sentence S. token, y _j represents the j-th token in the predicted sentence Y.

Further, based on the first decoding layer pre-Decoder and the second decoding layer next-Encoder respectively used to output the upper prediction sentence and the lower prediction sentence, the corresponding upper sentence loss function (first loss function) and the lower sentence are determined. Loss function (second loss function), and then obtain the target loss function of the initialized sequence-to-sequence model, that is, the sum of the above sentence loss function and the following sentence loss function. The specific formula is as follows:

in,

Represents the loss function pre-loss of the above sentence,

Represents the following sentence loss function next-loss.

According to the needs of the actual application scenario, by setting the batch size to 128, the epoch to 50, and the learning rate lr to 0.005, the initialized sequence-to-sequence model is trained until the target loss function value of the initialized sequence-to-sequence model is reached. When it becomes stable, the training ends and the trained sequence-to-sequence model is obtained.

Step 203: Perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.

Step 204: Use a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text. The sentence vector generation model is the encoding of a trained sequence-to-sequence model. layer.

In the implementation, the coding layer of the trained sequence-to-sequence model is extracted as a sentence vector generation model, so that after receiving a book recommendation request, according to the book title in the book recommendation request, the introduction text corresponding to the book title is obtained, based on Chinese punctuation is used to segment the introduction text into sentences, and the Harbin Institute of Technology LTP model is used to perform word segmentation processing on the segmented introduction text to obtain the sentence text after word segmentation. The sentence vector generation model is then used to encode the sentence text to obtain the vector representation of the sentence text. .

Step 205: Calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library, where the sentence embedding vector in the preset book sample library is generated using the sentence vector obtained from the model output.

In the implementation, based on the introduction text of each book in the initial book sample library, the sentence vector generation model is used to output the sentence embedding vector corresponding to the introduction text, thereby constructing a preset book sample library based on the output sentence embedding vector, and using cosine similarity This algorithm calculates the similarity value between the corresponding sentence vector output according to the book recommendation request and the sentence embedding vector corresponding to each book in the preset book sample library.

Step 206: Generate book recommendation information for the sentence text based on the sentence embedding vectors whose similarity values meet the preset conditions in the preset book sample library.

In the implementation, when the user browses a book on the platform, the book is used as the target book, and a book recommendation request containing the title of the target book is generated. Based on the introduction text corresponding to the title of the target book, the sentence vector generation model is used to generate the corresponding sentence vectors, and then calculate the similarity values between the generated sentence vectors and each set of sentence embedding vectors in the preset book sample library corresponding to the platform, and arrange them in descending order to embed sentences whose similarity values meet the preset conditions. The book information corresponding to the vector is recommended to the user as a similar book. The experiment found that the online AB test results showed that the user click-through rate obtained based on this embodiment can be effectively increased by 2.31%.

By applying the technical solution of this embodiment, the obtained initial sentence text is semantically segmented to obtain the segmented sentence text, and the pre-built sentence vector generation model is used to generate the coding process for predicting the context of the sentence text. Obtain the vector representation of the sentence text, and the sentence vector generation model is the encoding layer of the trained sequence-to-sequence model; wherein the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model , perform coding processing and context decoding processing on the context sentences in the constructed sentence sample set to the current sentence in the sequence, and obtain the upper prediction sentence and the lower prediction sentence of the current sentence; according to the upper prediction sentence and the lower prediction sentence, we get Trained sequence-to-sequence model. It can be seen that performing sequence-to-sequence model training on sequences based on context sentences and using the encoding layer of the trained sequence-to-sequence model to generate sentence vectors can effectively improve the accuracy of sentence vector generation and ensure the generation of sentence vectors while improving the difficulty of model training. The integrity of the semantic information and grammatical information of the sentence vector, thereby effectively avoiding the existing construction method based on the average word vector, destroying the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction, and based on contrastive learning The construction method, the training difficulty of the model is low, the transfer ability of the model in actual tasks is insufficient, and the accuracy of the generated sentence vectors is low, which is a technical problem.

Further, as a specific implementation of the method in Figure 1, the embodiment of the present application provides a sentence vector generation device, as shown in Figure 4. The device includes: a model training module 41, a preprocessing module 42, and an encoding module 43.

The model training module 41 can be used to use the initial sequence-to-sequence model to perform encoding processing and context decoding processing on the current sentence in the sequence of the context sentences in the constructed sentence sample set to obtain the above predicted sentence and the following sentence of the current sentence. Predict sentences; based on the above predicted sentences and the following predicted sentences, the trained sequence-to-sequence model is obtained.

The preprocessing module 42 can be used to perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.

The encoding module 43 may be used to utilize a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text. The sentence vector generation model is a trained sequence to The encoding layer of the sequence model.

In a specific application scenario, as shown in Figure 5, a book recommendation module 44 is also included.

In a specific application scenario, the model training module 41 includes a training unit 411.

The training unit 411 may be used to train the initial sequence-to-sequence model using a target loss function based on the above prediction sentence and the below prediction sentence of the current sentence, and obtain a trained sequence-to-sequence model; wherein, The target loss function is determined based on the sum of the first loss function and the second loss function.

In a specific application scenario, the context sentence pair sequence specifically includes: a current sentence used to be input to the encoding layer of the initial sequence to sequence model for context sentence prediction; and, used to train the initial sequence to sequence model. The upper target sentence and the lower target sentence of the output result are the upper prediction sentence and the lower prediction sentence output during the model training process.

In a specific application scenario, the model training module 41 can be used to perform word segmentation processing using a word segmentation tool according to the sequence of context sentence pairs to obtain a sequence of context sentence pairs after word segmentation. According to the sequence of context sentence pairs after word segmentation, For the current sentence in the sequence, use the encoding layer of the initial sequence to the sequence model to obtain the sentence embedding vector of the current sentence. According to the sentence embedding vector of the current sentence, use the initial sequence to the sequence set in parallel in the sequence model. The two decoding layers respectively obtain the upper prediction sentence and the lower prediction sentence, wherein the two decoding layers refer to the first decoding layer used to predict the upper part and the second decoding layer used to predict the lower part.

In a specific application scenario, the first decoding layer used to predict the upper part is a first GRU model, the second decoding layer used to predict the lower part is a second GRU model, and the decoding layer based on the current sentence Sentence embedding vector; the step of using the sentence embedding vector of the current sentence and using the two decoding layers set up in parallel in the initial sequence to sequence model to obtain the above predicted sentence and the following predicted sentence respectively, specifically including: The sentence embedding vector of the current sentence is used as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and the above predicted sentence of the current sentence is obtained through decoding processing; the sentence embedding vector of the current sentence is As the input data of the second GRU model, the following predicted sentence of the current sentence is obtained through decoding processing.

In a specific application scenario, the first loss function in the target loss function is set based on the first decoding layer used to predict the above, and the second loss function in the target loss function is based on the first loss function used in prediction. Set by the second decoding layer below.

In a specific application scenario, the book recommendation module 44 includes a similarity calculation unit 441 and a generation unit 442.

The similarity calculation unit 441 may be used to calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library.

The generation unit 442 may be configured to generate book recommendation information for the sentence text based on the sentence embedding vectors whose similarity values satisfy the preset conditions in the preset book sample library; wherein, the sentences in the preset book sample library The embedding vector is obtained using the sentence vector generation model output.

It should be noted that for other corresponding descriptions of each functional unit involved in a sentence vector generation device provided by the embodiment of the present application, reference can be made to the corresponding descriptions in Figures 1 and 2, and details will not be described again here.

Based on the above methods shown in Figures 1 and 2, correspondingly, embodiments of the present application also provide a storage medium on which a computer program is stored. When the program is executed by a processor, the above-mentioned methods of Figures 1 and 2 are implemented. Sentence vector generation methods, including:

Optionally, the steps of obtaining a trained sequence-to-sequence model based on the above predicted sentences and the following predicted sentences include:

According to the upper prediction sentence and the lower prediction sentence of the current sentence, use the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model;

Wherein, the target loss function is determined based on the sum of the first loss function and the second loss function.

Optionally, the context sentence pair sequence specifically includes:

The current sentence used as input to the encoding layer of the initial sequence-to-sequence model for context sentence prediction;

And, the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.

Optionally, the storage medium is a computer-readable storage medium, which may be non-volatile or volatile.

Based on this understanding, the technical solution of the present application can be embodied in the form of a software product. The software product can be stored in a storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) and includes a number of instructions to enable A computer device (which may be a personal computer, a server, or a network device, etc.) executes the methods described in each implementation scenario of this application.

Based on the above methods shown in Figures 1 and 2, and the virtual device embodiments shown in Figures 4 and 5, in order to achieve the above purpose, embodiments of the present application also provide a computer device, which can be a personal computer, Server, network equipment, etc., the physical equipment includes a storage medium and a processor; the storage medium is used to store the computer program; the processor is used to execute the computer program to implement the above sentence vector generation method as shown in Figure 1 and Figure 2, include:

According to the upper predicted sentence and the lower predicted sentence of the current sentence, use the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model;

Optionally, the context sentence pair sequence specifically includes:

Optionally, the computer device may also include a user interface, a network interface, a camera, a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, a WI-FI module, etc. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc. The optional user interface may also include a USB interface, a card reader interface, etc. Optional network interfaces may include standard wired interfaces, wireless interfaces (such as Bluetooth interfaces, WI-FI interfaces), etc.

Those skilled in the art can understand that the structure of a computer device provided in this embodiment does not constitute a limitation on the physical device, and may include more or less components, or combine certain components, or arrange different components.

The storage medium may also include an operating system and a network communication module. An operating system is a program that manages the hardware and software resources of a computer device and supports the operation of information processing programs and other software and/or programs. The network communication module is used to implement communication between components within the storage medium, as well as communication with other hardware and software in the physical device.

Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus a necessary general hardware platform, or can also be implemented by hardware. By applying the technical solution of this application, compared with existing sentence vector generation solutions such as the construction based on word vector average and the construction based on contrastive learning, this embodiment uses context sentences to perform sequence-to-sequence model training on sequences, and utilizes well-trained The sentence vectors of sentence texts generated by the encoding layer of the sequence-to-sequence model can ensure the integrity of the semantic information and grammatical information of the sentence text, thereby effectively improving the accuracy of sentence vector generation, thereby effectively avoiding the existing problem based on the average value of word vectors. The construction method destroys the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction. As well as the construction method based on contrastive learning, the training difficulty of the model is low, and the transfer ability of the model in actual tasks is insufficient. The generated Technical issues with low accuracy of sentence vectors.

Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred implementation scenario, and the modules or processes in the accompanying drawing are not necessarily necessary for implementing the present application. Those skilled in the art can understand that the modules in the devices in the implementation scenario can be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or can be correspondingly changed and located in one or more devices different from the implementation scenario. The modules of the above implementation scenarios can be combined into one module or further split into multiple sub-modules.

The above serial numbers of this application are only for description and do not represent the advantages and disadvantages of the implementation scenarios. What is disclosed above are only a few specific implementation scenarios of the present application. However, the present application is not limited thereto. Any changes that can be thought of by those skilled in the art should fall within the protection scope of the present application.

Claims

A sentence vector generation method, including:

Perform semantic segmentation on the obtained initial sentence text to obtain the segmented sentence text;

Using a pre-built sentence vector generation model, the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;

Wherein, the trained sequence-to-sequence model is obtained through the following steps:

Using the initial sequence-to-sequence model, the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;

Based on the above predicted sentences and the following predicted sentences, the trained sequence-to-sequence model is obtained.
The method according to claim 1, wherein the step of obtaining a trained sequence-to-sequence model based on the above predicted sentence and the below predicted sentence specifically includes:

According to the upper prediction sentence and the lower prediction sentence of the current sentence, use the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model;

Wherein, the target loss function is determined based on the sum of the first loss function and the second loss function.
The method according to claim 1 or 2, wherein the sequence of context sentence pairs specifically includes:

The current sentence used as input to the encoding layer of the initial sequence-to-sequence model for context sentence prediction;

And, the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.
The method according to claim 1, wherein the initial sequence-to-sequence model is used to perform encoding processing and context decoding processing on the current sentence in the sequence of the context sentences in the constructed sentence sample set to obtain the upper limit of the current sentence. The steps for predicting sentences in the text and predicting sentences below include:

According to the sequence of context sentence pairs, use a word segmentation tool to perform word segmentation processing to obtain a sequence of context sentence pairs after word segmentation;

According to the current sentence in the sequence of context sentences after word segmentation, use the encoding layer of the initial sequence to sequence model to obtain the sentence embedding vector of the current sentence;

According to the sentence embedding vector of the current sentence, use the two decoding layers set up in parallel in the initial sequence to sequence model to obtain the upper prediction sentence and the lower prediction sentence respectively;

The two decoding layers refer to a first decoding layer used for predicting the upper part and a second decoding layer used for predicting the lower part.
The method according to claim 4, wherein the first decoding layer for predicting the upper part is a first GRU model, the second decoding layer for predicting the lower part is a second GRU model, and the first decoding layer for predicting the upper part is a second GRU model. The sentence embedding vector of the current sentence is described, and the two decoding layers set up in parallel in the initial sequence to sequence model are used to obtain the steps of predicting the sentence above and predicting the sentence below respectively, including:

Use the sentence embedding vector of the current sentence as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and obtain the above predicted sentence of the current sentence through decoding processing;

The sentence embedding vector of the current sentence is used as the input data of the second GRU model, and the following predicted sentence of the current sentence is obtained through decoding processing.
The method according to claim 2 or 4, wherein the first loss function in the target loss function is set based on the first decoding layer used to predict the above, and the second loss in the target loss function The function is set based on the second decoding layer used to predict the context.
The method according to claim 1, wherein after the step of obtaining the vector representation of the sentence text by using the sentence vector generation model through encoding processing for predicting the context of the sentence text, it further includes:

Calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library;

Generate book recommendation information for the sentence text according to the sentence embedding vectors whose similarity values meet the preset conditions in the preset book sample library;

Wherein, the sentence embedding vectors in the preset book sample library are obtained by using the sentence vector generation model output.
A sentence vector generation device, which includes:

The model training module is used to use the initial sequence-to-sequence model to encode and contextually decode the current sentence in the sequence from the context sentences in the constructed sentence sample set, to obtain the upper prediction sentence and the lower prediction sentence of the current sentence. ; And, based on the above predicted sentence and the following predicted sentence, the trained sequence-to-sequence model is obtained;

The preprocessing module is used to perform semantic segmentation on the obtained initial sentence text and obtain the segmented sentence text;

An encoding module, configured to utilize a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing used to predict the context of the sentence text. The sentence vector generation model is a trained sequence-to-sequence model. encoding layer.
The device according to claim 8, wherein the model training module specifically includes:

A training unit, configured to use a target loss function to train the initial sequence-to-sequence model based on the above prediction sentence and the below prediction sentence of the current sentence, and obtain a trained sequence-to-sequence model;

Wherein, the target loss function is determined based on the sum of the first loss function and the second loss function.
The device according to claim 8 or 9, wherein the sequence of context sentence pairs specifically includes:

The current sentence used as input to the encoding layer of the initial sequence-to-sequence model for context sentence prediction;

And, the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.
The device according to claim 8, wherein the model training module specifically includes:

According to the sequence of context sentence pairs, use a word segmentation tool to perform word segmentation processing to obtain a sequence of context sentence pairs after word segmentation;

According to the current sentence in the sequence of context sentences after word segmentation, use the encoding layer of the initial sequence to sequence model to obtain the sentence embedding vector of the current sentence;

According to the sentence embedding vector of the current sentence, use the two decoding layers set up in parallel in the initial sequence to sequence model to obtain the upper prediction sentence and the lower prediction sentence respectively;

The two decoding layers refer to a first decoding layer used for predicting the upper part and a second decoding layer used for predicting the lower part.
The apparatus according to claim 11, wherein the first decoding layer for predicting the upper part is a first GRU model, the second decoding layer for predicting the lower part is a second GRU model, and the first decoding layer for predicting the upper part is a second GRU model, and the first decoding layer for predicting the upper part is a second GRU model. The sentence embedding vector of the current sentence is described, and the two decoding layers set up in parallel in the initial sequence to sequence model are used to obtain the steps of predicting the sentence above and predicting the sentence below respectively, including:

Use the sentence embedding vector of the current sentence as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and obtain the above predicted sentence of the current sentence through decoding processing;

The sentence embedding vector of the current sentence is used as the input data of the second GRU model, and the following predicted sentence of the current sentence is obtained through decoding processing.
The device according to claim 9 or 11, wherein the first loss function in the target loss function is set based on the first decoding layer used to predict the above, and the second loss in the target loss function The function is set based on the second decoding layer used to predict the context.
The device according to claim 8, further comprising a book recommendation module, specifically including:

A similarity calculation unit, used to calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library;

A generation unit configured to generate book recommendation information for the sentence text based on sentence embedding vectors whose similarity values satisfy the preset conditions in the preset book sample library;

Wherein, the sentence embedding vectors in the preset book sample library are obtained by using the sentence vector generation model output.
A computer device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor implements a sentence vector generation method when executing the program, including:

Perform semantic segmentation on the obtained initial sentence text to obtain the segmented sentence text;

Using a pre-built sentence vector generation model, the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;

Wherein, the trained sequence-to-sequence model is obtained through the following steps:

Using the initial sequence-to-sequence model, the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;

Based on the above predicted sentences and the following predicted sentences, the trained sequence-to-sequence model is obtained.
The computer device according to claim 15, wherein the step of obtaining a trained sequence-to-sequence model based on the above prediction sentence and the below prediction sentence specifically includes:

According to the upper prediction sentence and the lower prediction sentence of the current sentence, use the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model;

Wherein, the target loss function is determined based on the sum of the first loss function and the second loss function.
The computer device according to claim 15 or 16, wherein the sequence of context sentence pairs specifically includes:

The current sentence used as input to the encoding layer of the initial sequence-to-sequence model for context sentence prediction;

And, the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.
A storage medium with a computer program stored thereon, wherein when the program is executed by a processor, a sentence vector generation method is implemented, including:

Perform semantic segmentation on the obtained initial sentence text to obtain the segmented sentence text;

Using a pre-built sentence vector generation model, the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;

Wherein, the trained sequence-to-sequence model is obtained through the following steps:

Using the initial sequence-to-sequence model, the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;

Based on the above predicted sentences and the following predicted sentences, the trained sequence-to-sequence model is obtained.
The computer device according to claim 18, wherein the step of obtaining a trained sequence-to-sequence model based on the above predicted sentence and the below predicted sentence specifically includes:

According to the upper prediction sentence and the lower prediction sentence of the current sentence, use the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model;

Wherein, the target loss function is determined based on the sum of the first loss function and the second loss function.
The computer device according to claim 18 or 19, wherein the sequence of context sentence pairs specifically includes:

The current sentence used as input to the encoding layer of the initial sequence-to-sequence model for context sentence prediction;

And, the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.