CN114444471A - Sentence vector generation method and device, computer equipment and storage medium - Google Patents

Sentence vector generation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114444471A
CN114444471A CN202210232057.9A CN202210232057A CN114444471A CN 114444471 A CN114444471 A CN 114444471A CN 202210232057 A CN202210232057 A CN 202210232057A CN 114444471 A CN114444471 A CN 114444471A
Authority
CN
China
Prior art keywords
sentence
sequence
model
context
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210232057.9A
Other languages
Chinese (zh)
Inventor
陈浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210232057.9A priority Critical patent/CN114444471A/en
Priority to PCT/CN2022/089817 priority patent/WO2023168814A1/en
Publication of CN114444471A publication Critical patent/CN114444471A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Abstract

The application discloses a sentence vector generation method and device, computer equipment and a storage medium, relates to the technical field of artificial intelligence, and can improve the accuracy of sentence vector generation. The method comprises the following steps: performing semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text; and utilizing a pre-constructed sentence vector generation model, and obtaining the vector representation of the sentence text through coding processing for predicting the text context of the sentence, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model. The method and the device are suitable for book recommendation based on the book text sentence vectors.

Description

Sentence vector generation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a sentence vector generation method, apparatus, computer device, and storage medium.
Background
Natural language processing is an important direction in the fields of computer science and artificial intelligence, and sentence vectors (sense embedding) are widely applied to many application scenarios of natural language processing as vector representation of text data. The sentence vector representation representing the information such as text data characteristics, semantics and grammar is obtained by mapping the text data to a quantifiable vector space, and then the relation between the text sentences is obtained by using methods such as vector clustering and classification, so that the application of the sentence vectors in the actual scene can be realized.
The existing solution for sentence vector construction mainly comprises a construction method based on word vector average values and a construction method based on contrast learning, wherein the construction method based on the word vector average values is word2vec, glove, bert and the like; based on the construction method of the contrast learning, the positive sample of the contrast learning is constructed by using different modes, such as dropout, replacement, deletion, reverse translation and the like. The existing solution has the following defects: 1) the construction method based on the word vector average value destroys the dependency relationship among words in the sentence, and has low accuracy of feature extraction; 2) although the construction method based on the contrast learning has many methods for obtaining positive samples, the similarity between the randomly selected negative sample and the original sentence is low, so that the training difficulty of the model is low, the migration capability of the model in an actual task is insufficient, and the accuracy of the generated sentence vector is low.
Disclosure of Invention
In view of the above, the present application provides a sentence vector generation method, an apparatus, a computer device, and a storage medium, and mainly aims to solve the technical problems in the prior art that the accuracy of sentence feature extraction is low in a construction method based on a word vector average value, and the accuracy of a generated sentence vector is low due to insufficient migration capability of a model in an actual task in a construction method based on comparison learning.
According to an aspect of the present application, there is provided a sentence vector generation method, including:
performing semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text;
utilizing a pre-constructed sentence vector generation model, and obtaining the vector representation of the sentence text through coding processing for predicting the text context of the sentence, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model;
wherein the trained sequence-to-sequence model is obtained by the following steps:
encoding and decoding a current sentence in a sequence by using a context sentence in a constructed sentence sample set by using an initial sequence-to-sequence model to obtain an upper prediction sentence and a lower prediction sentence of the current sentence;
and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences.
According to another aspect of the present application, there is provided a sentence vector generation apparatus, the apparatus comprising:
the model training module can be used for carrying out coding processing and context decoding processing on a current sentence in a sequence by using a model from an initial sequence to a sequence to the context sentences in the constructed sentence sample set so as to obtain an upper predicted sentence and a lower predicted sentence of the current sentence; and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences;
the preprocessing module is used for performing semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text;
and the coding module is used for utilizing a pre-constructed sentence vector generation model to obtain the vector representation of the sentence text through coding processing for predicting the text context of the sentence, and the sentence vector generation model is a coding layer from a trained sequence to a sequence model.
According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above sentence vector generation method.
According to yet another aspect of the present application, there is provided a computer device comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the sentence vector generation method when executing the program.
By means of the technical scheme, compared with the existing sentence vector generation schemes based on mainstream modes such as a structure of a word vector average value and a structure based on contrast learning, the sentence vector generation method, the sentence vector generation device, the computer equipment and the storage medium provided by the application can perform semantic segmentation on the obtained initial sentence text to obtain the segmented sentence text, and obtain the vector representation of the sentence text by using a pre-constructed sentence vector generation model through coding processing for predicting the text context of the sentence, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model; wherein the trained sequence-to-sequence model is obtained by the following steps: encoding and decoding a current sentence in a sequence by using a context sentence in a constructed sentence sample set by using an initial sequence-to-sequence model to obtain an upper prediction sentence and a lower prediction sentence of the current sentence; and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences. Therefore, the sequence-to-sequence model training is carried out on the sequence based on the context sentence, the sentence vector is generated by utilizing the trained sequence-to-sequence model coding layer, the sentence vector generation accuracy can be effectively improved on the basis of improving the model training difficulty, and the integrity of the semantic information and the grammatical information of the generated sentence vector is ensured, so that the technical problems that the existing construction method based on the word vector average value damages the dependency relationship among words in the sentence, so that the sentence feature extraction accuracy is low, and the construction method based on the contrast learning is low in model training difficulty, insufficient in the migration capability of the model in the actual task, and low in the accuracy of the generated sentence vector are effectively avoided.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart illustrating a sentence vector generation method provided in an embodiment of the present application;
FIG. 2 is a flow chart of another sentence vector generation method provided in the embodiments of the present application;
FIG. 3 is a diagram illustrating an initial sequence-to-sequence model architecture provided by an embodiment of the present application;
fig. 4 is a schematic structural diagram of a sentence vector generation apparatus provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of another sentence vector generation apparatus provided in the embodiment of the present application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Wherein, Artificial Intelligence (AI) is a theory, method, technology and application system for simulating, extending and expanding human Intelligence by using a digital computer or a machine controlled by the digital computer, sensing environment, acquiring knowledge and obtaining the best result by using the knowledge.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Aiming at the technical problems that the accuracy of sentence feature extraction is low in a construction method based on a word vector average value in the prior art, the migration capability of a model in an actual task is insufficient in a construction method based on comparison learning, and the accuracy of a generated sentence vector is low, the embodiment provides a sentence vector generation method, as shown in fig. 1, the method is applied to computer equipment such as a server for illustration, wherein the server can be an independent server, or a cloud server for providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Distribution Network (CDN), a big data platform, an artificial intelligence platform and the like. The method comprises the following steps:
step 101, performing semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text.
In this embodiment, a book recommendation scenario is taken as an example, and the method is suitable for recommending other similar books based on the obtained book text content, specifically, when a book recommendation request is received, the book text content corresponding to the book name is obtained according to the book name in the book recommendation request, the book text content is segmented based on a chinese landmark, and a plurality of sentence texts for inputting a sentence vector generation model are obtained through text segmentation. The book text content may be a book abstract text, a book brief introduction text, etc., according to the requirements of the actual application scenario, and is not limited herein.
102, utilizing a pre-constructed sentence vector generation model, and obtaining a vector representation of the sentence text through coding processing for predicting the text context of the sentence, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model; wherein the trained sequence-to-sequence model is obtained by the following steps: encoding and decoding a current sentence in a sequence by using a context sentence in a constructed sentence sample set by using an initial sequence-to-sequence model to obtain an upper prediction sentence and a lower prediction sentence of the current sentence; and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences.
In this embodiment, an initial sequence is trained to a sequence model based on a constructed sentence sample set including a context sentence pair sequence, where the context sentence pair sequence includes a current sentence and a context sentence corresponding to the current sentence, the current sentence is input into an initial sequence to a coding layer of the sequence model for coding, a vector representation including context feature information of the current sentence is obtained, the vector representation including the context feature information of the current sentence is respectively input into the initial sequence to two decoding layers arranged in parallel in the sequence model, an upper predicted sentence and a lower predicted sentence of the current sentence are obtained through decoding, and further, a trained sequence to the sequence model is obtained by using the upper sentence and the lower sentence of the current sentence in the context sentence pair sequence as training targets of the upper predicted sentence and the lower predicted sentence. Therefore, the coding layer from the trained sequence to the sequence model has the coding capability of accurately predicting the context of the current sentence, and the integrity of the semantic information and the syntactic information of the context of the current sentence can be kept, so that the vector representation output on the basis can contain the complete context characteristic information of the current sentence, and the accuracy of subsequent book recommendation is further ensured.
The context sentence pair sequence constructed based on the current sentence and the context sentence is used as input data from the initial sequence to the sequence model, the integral structure of the text data can not be damaged, the text characteristics of interdependency and mutual influence among words are reserved, so that the model can be ensured to learn complete semantic information and grammatical information contained in the sentence text, and the accuracy of the model in extracting the context sentence characteristics is improved.
According to the scheme, the semantic segmentation can be performed on the obtained initial sentence text to obtain the segmented sentence text, a pre-constructed sentence vector generation model is utilized, and the vector representation of the sentence text is obtained through the coding processing for predicting the context of the sentence text, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model; wherein the trained sequence-to-sequence model is obtained by the following steps: encoding and decoding a current sentence in a sequence by using a context sentence in a constructed sentence sample set by using an initial sequence-to-sequence model to obtain an upper prediction sentence and a lower prediction sentence of the current sentence; and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences. Compared with the existing sentence vector generation schemes based on the construction of the word vector average value, the construction based on the contrast learning and the like, the embodiment performs the sequence-to-sequence model training on the sequence by using the context sentences, and can ensure the integrity of the semantic information and the grammatical information of the sentence text by using the sentence vectors of the sentence text generated by the coding layer of the trained sequence-to-sequence model, thereby effectively improving the accuracy of the sentence vector generation.
Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully explain the specific implementation process of the embodiment, another sentence vector generation method is provided, as shown in fig. 2, and the method includes:
step 201, using the initial sequence to the sequence model, encoding and decoding the current sentence in the sequence for the context sentences in the constructed sentence sample set, so as to obtain the upper predicted sentence and the lower predicted sentence of the current sentence.
Wherein the context sentence pair sequence specifically includes: a current sentence for inputting to the coding layer of the initial sequence to sequence model for context sentence prediction; and the upper target sentence and the lower target sentence are used for training the initial sequence to a sequence model output result, and the output result is the upper prediction sentence and the lower prediction sentence output in the model training process.
For explaining the specific implementation of step 201, as a preferred embodiment, step 201 may specifically include: according to the context sentence pair sequence, performing word segmentation processing by using a word segmentation tool to obtain a word segmented context sentence pair sequence; obtaining a sentence embedding vector of the current sentence by using the initial sequence to a coding layer of a sequence model according to the current sentence in the context sentence pair sequence after word segmentation; according to the sentence embedding vector of the current sentence, an upper predicted sentence and a lower predicted sentence are respectively obtained by utilizing the two decoding layers which are arranged in parallel from the initial sequence to the sequence model; wherein, the two decoding layers refer to a first decoding layer for predicting above and a second decoding layer for predicting below.
To illustrate a specific implementation manner of step 201, as another preferred embodiment, the step of obtaining an upper predicted sentence and a lower predicted sentence respectively by using two decoding layers arranged in parallel from the initial sequence to a sequence model according to a sentence embedding vector of the current sentence is specifically included; respectively taking the sentence embedding vector of the current sentence as input data of a reset gate, an update gate and a candidate memory unit in a first GRU model, and obtaining an upper predicted sentence of the current sentence through decoding processing; and using the sentence embedding vector of the current sentence as input data of a second GRU model, and obtaining a following predicted sentence of the current sentence through decoding processing.
In implementation, before the step of obtaining the above predicted sentence and the below predicted sentence of the current sentence by using the initial sequence to sequence model according to the current sentence in the context sentence pair sequence, the method further includes: a sample set of sentences is constructed, the sample set of sentences comprising a sequence of contextual sentence pairs. The method comprises the following specific steps:
1) randomly selecting any book text, and performing sentence segmentation on the selected book text based on the Chinese mark points to obtain a book text D, D ═ S1,S2,S3,S4,S5...Si,...,Sn]Wherein S isiThe ith sentence in the book text D is represented, and n represents the number of sentences obtained by sentence division of the book text D. For example, the book text set includes 3727 books and all the text contents of each book, randomly selects any book text from the books, and performs sentence division on all the text contents of the selected book text.
2) Build based on book text DThe following sentence pair sequence sensor is to construct a context sentence pair sequence by traversing each sentence in the book text D to obtain a sentence sample set G. Wherein the context sentence is represented as (S) for the sequence1,S2,S3)、(S2,S3,S4)、(S3,S4,S5)、(Si-1,SiSi+1)、…、(Sn-2,Sn-1,Sn) In which S isiRepresenting the current sentence, Si-1Is represented by the formulaiAdjacent above target sentence, Si+1Is represented by the formulaiThe adjacent following target sentences.
In implementation, a sentence embedding vector sensor embedding hs of a current sentence is output by using an encoding layer Encoder from an initial sequence to a sequence model, a first decoding layer pre-Decoder for predicting a sequence of a previous sentence and a second decoding layer next-Decoder for predicting a sequence of a next sentence are synchronously input, and the first decoding layer pre-Decoder and the second decoding layer next-Decoder are used for respectively obtaining the previous predicted sentence of the current sentence and the next predicted sentence of the current sentence. As shown in fig. 3, the specific steps include:
1) utilizing a word segmentation tool (LTP model of Hadamard construction) to perform word segmentation processing on each context sentence in the sentence sample set G, and obtaining a sentence expressed as S after word segmentationi[t1,t2,...,tp,...,t1]Wherein, tpDenotes SiThe p-th token in the middle, l represents SiAnd obtaining the number of tokens after word segmentation.
2) An initial sequence-to-sequence model is constructed based on an encoder-decoder model architecture, the initial sequence-to-sequence model comprises a coding layer and two decoding layers, and basic models of the coding layer and the decoding layers are all gated cyclic units (GRU: gate recovery Unit).
3) Using the sentence sample set G after word segmentation as the input from the initial sequence to the sequence model, inputting the current sentence in each sentence pair sequence into the encoding layer Encoder from the initial sequence to the sequence model, and obtaining the sentence embedding vector sent of the current sentence through encoding processingnce embedding hsEmbedding a sentence into a vector sensing embedding h by using a first decoding layer pre-Decoder and a second decoding layer next-EncodersAnd synchronously decoding to respectively obtain an upper predicted sentence of the current sentence and a lower predicted sentence corresponding to the current sentence. The method specifically comprises the following steps:
taking the current sentence in the context sentence pair sequence as the input of the initial sequence to the sequence model coding layer Encoder to (S)i-1,Si,Si+1) For example, the word after (S)i-1,Si,Si+1) Sentence S ini=[t1,t2,...,tp,...,tl]Inputting an encoding layer Encoder, and obtaining S through encoding processingiSentence embedding vector embedding hs
② embedding sentences into vector embedding hsAs an input of the first decoding layer pre-Decoder, an above predicted sentence Y corresponding to the current sentence is obtained by a decoding processi-1. Wherein, the vector sensor embedding h is embedded according to the sentence of the current sentence SisPredicting the predicted sentence Y of the above corresponding to the current sentencei-1Because the upward prediction does not accord with the characteristics of natural language, the training difficulty of the first decoding layer is greater than that of the next-Decoder (decoding in the following) of the second decoding layer, the GRU model architecture is improved, the accuracy of the upward prediction is improved, the training efficiency is ensured, and the gradient is prevented from disappearing. Specifically, the embedding vector sensing embedding h of the current sentence is added to the input ends of the updating gate, the resetting gate and the candidate memory unit in the first decoding layersAnd setting corresponding parameters to ensure that the GRU model at each moment can be combined with the current sentence S in the token-by-token generation processiSentence embedding vector sensing embedding hsThe concrete formula is as follows:
zt=σ(Wzxt+Uzht-1+Vzhs)
rt=σ(Wrxt+Urht-1+Vrhs)
Figure BDA0003538815760000081
Figure BDA0003538815760000082
where zt denotes the update gate of the GRU model, Wz,UzUpdating the door parameter, x, for the original GRU modeltAn input vector representing the current time t, ht-1 a vector passed to the current time t from the previous time, i.e. time t-1, VzRepresenting embedding vector embedding h for sentencessAnd (4) setting parameters. Similarly, the reset door r of the GRU modeltAnd candidate memory cell
Figure BDA0003538815760000091
All incorporate sensor embeddings,Wr,Ur,VrParameter representing reset gate, tanh represents activation function, Wk,Uk,VkParameter, h, representing a candidate memory celltAn output vector indicating the current time t, σ indicates a fully connected layer with an activation function, and |, indicates a vector corresponding element multiplication operation.
Thirdly, synchronizing with the first decoding layer, and embedding the sentences into the vector sensing embedding hsInputting a next-Encoder of a second decoding layer, and obtaining a context predicted sentence Y corresponding to the current sentence through decoding processingi+1. The next sentence is predicted based on the current sentence and is in accordance with the characteristics of the natural language from top to bottom, so that the second decoding layer next-Encoder adopts the existing GRU model and embeds the sentence into the vector embedding hsOnly as the initial vector for the second decoding layer.
Therefore, the upper sentence of the current sentence is predicted based on the encoder-decoder model framework, the top-down rule of the natural language is broken, the difficulty of model training is improved, the model can be trained fully, sentence vector representation containing complete semantic signals and grammar information is output, and furthermore, the difficulty of model training can be improved and the training efficiency of the model can be effectively guaranteed through the improvement of an update gate, a reset gate and a candidate memory unit of the GRU model.
Step 202, training the initial sequence-to-sequence model by using a target loss function according to the upper predicted sentence and the lower predicted sentence of the current sentence to obtain a trained sequence-to-sequence model. Wherein the target loss function is determined according to a sum of a first loss function and a second loss function, a first loss function of the target loss functions being based on a first decoding layer setting for predicting an above context, a second loss function of the target loss functions being based on a second decoding layer setting for predicting a below context.
In practice, according to the above target sentence Si-1A following target sentence Si+1And the above predicted sentence Yi-1Context predicted sentence Yi+1And training the network parameters of the initialized sequence-to-sequence model by using the target loss function until the initialized sequence-to-sequence model converges to obtain the trained sequence-to-sequence model. Specifically, a cross entropy loss function is used as a basic loss function, and the specific formula is as follows:
Figure BDA0003538815760000101
wherein CE represents a cross entropy loss function, S represents a current sentence, Y represents a predicted sentence generated by a Decoder layer Decoder, l represents the number of tokens determined after the current sentence S is segmented, and t represents the number of tokensjRepresenting the jth token, y obtained by word segmentation of the current sentence SjRepresenting the jth token in the predicted sentence Y.
Further, based on a first decoding layer pre-Decoder and a second decoding layer next-Encoder for outputting the above predicted sentence and the below predicted sentence, respectively, a corresponding upper sentence loss function (first loss function) and a lower sentence loss function (second loss function) are determined, and then a target loss function from the initialized sequence to the sequence model, i.e. the sum of the upper sentence loss function and the lower sentence loss function, is obtained, and the specific formula is as follows:
Figure BDA0003538815760000102
wherein the content of the first and second substances,
Figure BDA0003538815760000103
represents the above sentence loss function pre-loss,
Figure BDA0003538815760000104
the following sentence loss function next-loss is represented.
According to the requirements of practical application scenes, by setting the batch size of 128, the period epoch of 50 and the learning rate lr of 0.005, the initialized sequence-to-sequence model is trained until the target loss function value of the initialized sequence-to-sequence model tends to be stable, and the training is finished to obtain the trained sequence-to-sequence model.
And 203, performing semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text.
And 204, utilizing a pre-constructed sentence vector generation model, and obtaining the vector representation of the sentence text through coding processing for predicting the text context of the sentence, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model.
In implementation, a trained sequence is extracted to a coding layer of a sequence model to serve as a sentence vector generation model, so that after a book recommendation request is received, brief introduction texts corresponding to book names are obtained according to the book names in the book recommendation request, the brief introduction texts are segmented based on Chinese punctuations, the segmented brief introduction texts are segmented by using a Hadamard LTP model to obtain sentence texts after segmentation, and then the sentence vector generation model is used for coding the sentence texts to obtain vector representation of the sentence texts.
Step 205, calculating a similarity value between the vector representation of the sentence text and a sentence embedding vector in a preset book sample library, wherein the sentence embedding vector in the preset book sample library is obtained by using the sentence vector generation model output.
In implementation, according to the brief introduction text of each book in the initial book sample library, the sentence embedding vector corresponding to the brief introduction text is output by using the sentence vector generation model, so that a preset book sample library is constructed based on the output sentence embedding vector, and the similarity value between the corresponding sentence vector output according to the book recommendation request and the sentence embedding vector corresponding to each book in the preset book sample library is calculated by using a cosine value similarity algorithm.
And step 206, generating book recommendation information of the sentence text according to the sentence embedding vector with the similarity value meeting the preset condition in the preset book sample library.
In implementation, when a user browses a book on a platform, the book is used as a target book, a book recommendation request containing the book name of the target book is generated, corresponding sentence vectors are generated by using a sentence vector generation model according to brief introduction texts corresponding to the book name of the target book, similarity values of the generated sentence vectors and each group of sentence embedding vectors in a preset book sample library corresponding to the platform are calculated respectively, and are arranged in a descending order, so that book information corresponding to the sentence embedding vectors with the similarity values meeting preset conditions is recommended to the user as similar books, and experiments show that online ABtest results show that the click rate of the user obtained based on the embodiment can be effectively improved by 2.31%.
By applying the technical scheme of the embodiment, semantic segmentation is performed on the obtained initial sentence text to obtain a segmented sentence text, a pre-constructed sentence vector generation model is utilized, and vector representation of the sentence text is obtained through coding processing for predicting the context of the sentence text, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model; wherein the trained sequence-to-sequence model is obtained by the following steps: encoding and decoding a current sentence in a sequence by using a context sentence in a constructed sentence sample set by using an initial sequence-to-sequence model to obtain an upper prediction sentence and a lower prediction sentence of the current sentence; and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences. Therefore, the sequence-to-sequence model training is carried out on the sequence based on the context sentence, the sentence vector is generated by utilizing the trained sequence-to-sequence model coding layer, the sentence vector generation accuracy can be effectively improved on the basis of improving the model training difficulty, and the integrity of the semantic information and the grammatical information of the generated sentence vector is ensured, so that the technical problems that the existing construction method based on the word vector average value damages the dependency relationship among words in the sentence, so that the sentence feature extraction accuracy is low, and the construction method based on the contrast learning is low in model training difficulty, insufficient in the migration capability of the model in the actual task, and low in the accuracy of the generated sentence vector are effectively avoided.
Further, as a specific implementation of the method in fig. 1, an embodiment of the present application provides a sentence vector generation apparatus, as shown in fig. 4, the apparatus includes: a model training module 41, a preprocessing module 42, and an encoding module 43.
The model training module 41 may be configured to perform encoding processing and context decoding processing on a current sentence in a sequence of context sentences in the constructed sentence sample set by using an initial sequence-to-sequence model to obtain an upper predicted sentence and a lower predicted sentence of the current sentence; and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences.
The preprocessing module 42 may be configured to perform semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text.
The encoding module 43 may be configured to obtain a vector representation of the sentence text through an encoding process for predicting a text context of the sentence by using a pre-constructed sentence vector generation model, where the sentence vector generation model is a trained sequence-to-sequence model encoding layer.
In a specific application scenario, as shown in FIG. 5, a book recommendation module 44 is further included.
In a specific application scenario, the model training module 41 includes a training unit 411.
The training unit 411 may be configured to train the initial sequence-to-sequence model by using a target loss function according to the upper predicted sentence and the lower predicted sentence of the current sentence, so as to obtain a trained sequence-to-sequence model; wherein the target loss function is determined from a sum of a first loss function and a second loss function.
In a specific application scenario, the context sentence pair sequence specifically includes: a current sentence for inputting to the coding layer of the initial sequence to sequence model for context sentence prediction; and the upper target sentence and the lower target sentence are used for training the initial sequence to a sequence model output result, and the output result is the upper prediction sentence and the lower prediction sentence output in the model training process.
In a specific application scenario, the model training module 41 may be specifically configured to perform word segmentation processing on the context sentence pair sequence according to the context sentence pair sequence by using a word segmentation tool to obtain a context sentence pair sequence after word segmentation, obtain a sentence embedding vector of the current sentence according to the current sentence in the context sentence pair sequence after word segmentation by using the initial sequence to a coding layer of a sequence model, and obtain an upper predicted sentence and a lower predicted sentence according to the sentence embedding vector of the current sentence by using two decoding layers that are arranged in parallel in the sequence model from the initial sequence, where the two decoding layers are a first decoding layer for predicting an upper text and a second decoding layer for predicting a lower text.
In a specific application scenario, the first decoding layer for predicting the context is a first GRU model, the second decoding layer for predicting the context is a second GRU model, and a vector is embedded according to a sentence of the current sentence; the step of obtaining an upper predicted sentence and a lower predicted sentence respectively by using the initial sequence to two decoding layers arranged in parallel in a sequence model according to the sentence embedding vector of the current sentence specifically comprises: respectively taking the sentence embedding vector of the current sentence as input data of a reset gate, an update gate and a candidate memory unit in a first GRU model, and obtaining an upper predicted sentence of the current sentence through decoding processing; and using the sentence embedding vector of the current sentence as input data of a second GRU model, and obtaining a following predicted sentence of the current sentence through decoding processing.
In a specific application scenario, a first loss function of the target loss functions is set based on a first decoding layer for predicting the above, and a second loss function of the target loss functions is set based on a second decoding layer for predicting the below.
In a specific application scenario, the book recommendation module 44 includes a similarity calculation unit 441 and a generation unit 442.
The similarity calculation unit 441 may be configured to calculate a similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library.
A generating unit 442, configured to generate book recommendation information of the sentence text according to the sentence embedding vector in the preset book sample library, where the similarity value meets a preset condition; and the sentence embedding vector in the preset book sample library is obtained by utilizing the sentence vector generation model to output.
It should be noted that other corresponding descriptions of the functional units related to the sentence vector generation apparatus provided in the embodiment of the present application may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not described herein again.
Based on the above methods shown in fig. 1 and fig. 2, correspondingly, the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for sentence vector generation shown in fig. 1 and fig. 2 is implemented.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method described in the implementation scenarios of the present application.
Based on the method shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 4 and fig. 5, in order to achieve the above object, an embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, and the like, where the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the sentence vector generation method as shown in fig. 1 and 2.
Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, a sensor, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., a bluetooth interface, WI-FI interface), etc.
It will be understood by those skilled in the art that the present embodiment provides a computer device structure that is not limited to the physical device, and may include more or less components, or some components in combination, or a different arrangement of components.
The storage medium may further include an operating system and a network communication module. An operating system is a program that manages the hardware and software resources of a computer device, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and other hardware and software in the entity device.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. By applying the technical scheme of the application, compared with the prior sentence vector generation schemes such as the construction based on the average value of word vectors, the construction based on contrast learning and the like, the embodiment utilizes the context sentences to carry out the sequence-to-sequence model training on the sequence, utilizes the sentence vectors of the sentence text generated by the coding layer of the trained sequence-to-sequence model, can ensure the integrity of the semantic information and the syntactic information of the sentence text, thereby effectively improving the accuracy of sentence vector generation, effectively avoiding the problem that the prior construction method based on the average value of word vectors destroys the dependency relationship among words in sentences, resulting in lower accuracy of sentence characteristic extraction, and a construction method based on comparison learning, the training difficulty of the model is low, the migration capability of the model in an actual task is insufficient, and the accuracy of the generated sentence vector is low.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A sentence vector generation method, comprising:
performing semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text;
utilizing a pre-constructed sentence vector generation model, and obtaining the vector representation of the sentence text through coding processing for predicting the text context of the sentence, wherein the sentence vector generation model is a coding layer from a trained sequence to a sequence model;
wherein the trained sequence-to-sequence model is obtained by the following steps:
encoding and decoding a current sentence in a sequence by using a context sentence in a constructed sentence sample set by using an initial sequence-to-sequence model to obtain an upper prediction sentence and a lower prediction sentence of the current sentence;
and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences.
2. The method of claim 1, wherein the step of obtaining a trained sequence-to-sequence model based on the context predicted sentences comprises:
training the initial sequence-to-sequence model by using a target loss function according to the upper predicted sentence and the lower predicted sentence of the current sentence to obtain a trained sequence-to-sequence model;
wherein the target loss function is determined from a sum of a first loss function and a second loss function.
3. The method according to claim 1 or 2, wherein the context sentence pair sequence comprises in particular:
a current sentence for inputting to the coding layer of the initial sequence to sequence model for context sentence prediction;
and the upper target sentence and the lower target sentence are used for training the initial sequence to a sequence model output result, and the output result is the upper prediction sentence and the lower prediction sentence output in the model training process.
4. The method according to claim 1, wherein the step of performing encoding processing and context decoding processing on the current sentence in the sequence by using the initial sequence-to-sequence model on the context sentences in the constructed sentence sample set to obtain the above predicted sentence and the below predicted sentence of the current sentence specifically comprises:
according to the context sentence pair sequence, performing word segmentation processing by using a word segmentation tool to obtain a word segmented context sentence pair sequence;
obtaining a sentence embedding vector of the current sentence by using the initial sequence to a coding layer of a sequence model according to the current sentence in the context sentence pair sequence after word segmentation;
according to the sentence embedding vector of the current sentence, an upper predicted sentence and a lower predicted sentence are respectively obtained by utilizing the two decoding layers which are arranged in parallel from the initial sequence to the sequence model;
wherein, the two decoding layers refer to a first decoding layer for predicting above and a second decoding layer for predicting below.
5. The method according to claim 4, wherein the first decoding layer for predicting the above context is a first GRU model, the second decoding layer for predicting the below context is a second GRU model, and the step of embedding a vector according to the sentence of the current sentence and obtaining the above predicted sentence and the below predicted sentence respectively by using the two decoding layers arranged in parallel from the initial sequence to the sequence model comprises:
respectively taking the sentence embedding vector of the current sentence as input data of a reset gate, an update gate and a candidate memory unit in a first GRU model, and obtaining an upper predicted sentence of the current sentence through decoding processing;
and using the sentence embedding vector of the current sentence as input data of a second GRU model, and obtaining a following predicted sentence of the current sentence through decoding processing.
6. The method of claim 2 or 4, wherein a first one of the target loss functions is based on a first decoding layer setting for predicting a context, and wherein a second one of the target loss functions is based on a second decoding layer setting for predicting a context.
7. The method of claim 1, wherein the step of using the sentence vector generation model to obtain the vector representation of the sentence text through the encoding process for predicting the text context of the sentence is further followed by:
calculating similarity values between the vector representation of the sentence text and sentence embedding vectors in a preset book sample library;
generating book recommendation information of the sentence text according to the sentence embedding vector with the similarity value meeting the preset condition in the preset book sample library;
and the sentence embedding vector in the preset book sample library is obtained by utilizing the sentence vector generation model to output.
8. A sentence vector generation apparatus, comprising:
the model training module can be used for carrying out coding processing and context decoding processing on a current sentence in a sequence by using a model from an initial sequence to a sequence to the context sentences in the constructed sentence sample set so as to obtain an upper predicted sentence and a lower predicted sentence of the current sentence; and obtaining a trained sequence-to-sequence model according to the above prediction sentences and the below prediction sentences;
the preprocessing module is used for performing semantic segmentation on the obtained initial sentence text to obtain a segmented sentence text;
and the coding module is used for utilizing a pre-constructed sentence vector generation model to obtain the vector representation of the sentence text through coding processing for predicting the text context of the sentence, and the sentence vector generation model is a coding layer from a trained sequence to a sequence model.
9. A computer device comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the sentence vector generation method of any of claims 1 to 7 when executing the program.
10. A storage medium on which a computer program is stored, the program, when executed by a processor, implementing the sentence vector generation method of any of claims 1 to 7.
CN202210232057.9A 2022-03-09 2022-03-09 Sentence vector generation method and device, computer equipment and storage medium Withdrawn CN114444471A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210232057.9A CN114444471A (en) 2022-03-09 2022-03-09 Sentence vector generation method and device, computer equipment and storage medium
PCT/CN2022/089817 WO2023168814A1 (en) 2022-03-09 2022-04-28 Sentence vector generation method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210232057.9A CN114444471A (en) 2022-03-09 2022-03-09 Sentence vector generation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114444471A true CN114444471A (en) 2022-05-06

Family

ID=81359057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210232057.9A Withdrawn CN114444471A (en) 2022-03-09 2022-03-09 Sentence vector generation method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114444471A (en)
WO (1) WO2023168814A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2568233A (en) * 2017-10-27 2019-05-15 Babylon Partners Ltd A computer implemented determination method and system
US10929614B2 (en) * 2019-01-03 2021-02-23 International Business Machines Corporation Automated contextual dialog generation for cognitive conversation
CN110147533B (en) * 2019-01-24 2023-08-29 腾讯科技(深圳)有限公司 Encoding method, apparatus, device and storage medium
CN111178082A (en) * 2019-12-05 2020-05-19 北京葡萄智学科技有限公司 Sentence vector generation method and device and electronic equipment
CN112052329A (en) * 2020-09-02 2020-12-08 平安科技(深圳)有限公司 Text abstract generation method and device, computer equipment and readable storage medium

Also Published As

Publication number Publication date
WO2023168814A1 (en) 2023-09-14

Similar Documents

Publication Publication Date Title
CN111444340B (en) Text classification method, device, equipment and storage medium
CN107680580B (en) Text conversion model training method and device, and text conversion method and device
WO2022007823A1 (en) Text data processing method and device
CN112015859A (en) Text knowledge hierarchy extraction method and device, computer equipment and readable medium
CN110990543A (en) Intelligent conversation generation method and device, computer equipment and computer storage medium
CN110163181B (en) Sign language identification method and device
CN110234018B (en) Multimedia content description generation method, training method, device, equipment and medium
CN109271493A (en) A kind of language text processing method, device and storage medium
CN109961041B (en) Video identification method and device and storage medium
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN113011202A (en) End-to-end image text translation method, system and device based on multi-task training
CN108304376B (en) Text vector determination method and device, storage medium and electronic device
CN113705313A (en) Text recognition method, device, equipment and medium
CN111178036B (en) Text similarity matching model compression method and system for knowledge distillation
CN115221846A (en) Data processing method and related equipment
CN116541492A (en) Data processing method and related equipment
CN114490926A (en) Method and device for determining similar problems, storage medium and terminal
CN113516972B (en) Speech recognition method, device, computer equipment and storage medium
CN113836929A (en) Named entity recognition method, device, equipment and storage medium
CN113722436A (en) Text information extraction method and device, computer equipment and storage medium
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN116958343A (en) Facial animation generation method, device, equipment, medium and program product
CN110852066B (en) Multi-language entity relation extraction method and system based on confrontation training mechanism
CN114444471A (en) Sentence vector generation method and device, computer equipment and storage medium
CN116821781A (en) Classification model training method, text analysis method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220506