WO2023168814A1 - Procédé et appareil de génération de vecteur de phrase, dispositif informatique et support de stockage - Google Patents
Procédé et appareil de génération de vecteur de phrase, dispositif informatique et support de stockage Download PDFInfo
- Publication number
- WO2023168814A1 WO2023168814A1 PCT/CN2022/089817 CN2022089817W WO2023168814A1 WO 2023168814 A1 WO2023168814 A1 WO 2023168814A1 CN 2022089817 W CN2022089817 W CN 2022089817W WO 2023168814 A1 WO2023168814 A1 WO 2023168814A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- sequence
- model
- context
- current
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to sentence vector generation methods, devices, computer equipment and storage media.
- Sentence embedding as a vector representation of text data, is widely used in many application scenarios of natural language processing.
- mapping text data into a quantifiable vector space we can obtain sentence vector representations that represent text data features, semantics, grammar and other information, and then use vector clustering, classification and other methods to obtain the relationship between text sentences, which can realize the sentence vector in Application in actual scenarios.
- Existing solutions for sentence vector construction mainly include construction methods based on word vector average and construction methods based on contrastive learning.
- Construction methods based on word vector average such as word2vec, glove, bert, etc.
- construction methods based on contrastive learning Construct positive samples for contrastive learning by using different methods, such as dropout, replacement, deletion, back-translation, etc.
- the inventor realized that the shortcomings of the existing solutions are: 1) The construction method based on the average word vector, which destroys the dependence between words in the sentence, and the accuracy of feature extraction is low; 2) The method based on contrastive learning Construction method.
- the similarity between the randomly selected negative samples and the original sentences is low, which leads to low difficulty in training the model.
- the transfer ability of the model in actual tasks is insufficient, which in turn leads to the generated Sentence vectors have lower accuracy.
- this application provides a sentence vector generation method, device, computer equipment and storage medium.
- the main purpose is to solve the problem in the existing technology that the construction method based on the word vector average has low accuracy in sentence feature extraction, and based on The construction method of contrastive learning has a technical problem of insufficient transfer ability of the model in actual tasks, resulting in low accuracy of the generated sentence vectors.
- a sentence vector generation method which method includes:
- the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
- the trained sequence-to-sequence model is obtained through the following steps:
- the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
- the trained sequence-to-sequence model is obtained.
- a sentence vector generation device which device includes:
- the model training module can be used to use the initial sequence-to-sequence model to encode and contextually decode the current sentence in the sequence from the context sentences in the constructed sentence sample set to obtain the above prediction sentence and the below prediction of the current sentence. sentence; and, based on the above predicted sentence and the following predicted sentence, a trained sequence-to-sequence model is obtained;
- the preprocessing module is used to perform semantic segmentation on the obtained initial sentence text and obtain the segmented sentence text;
- An encoding module configured to utilize a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing used to predict the context of the sentence text.
- the sentence vector generation model is a trained sequence-to-sequence model. encoding layer.
- a storage medium on which a computer program is stored.
- the above sentence vector generation method is implemented, including:
- the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
- the trained sequence-to-sequence model is obtained through the following steps:
- the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
- the trained sequence-to-sequence model is obtained.
- a computer device including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor.
- the processor executes the program, the above sentence vector is realized.
- Generation methods including:
- the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
- the trained sequence-to-sequence model is obtained through the following steps:
- the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
- the trained sequence-to-sequence model is obtained.
- sequence-to-sequence model training is performed on sequences based on contextual sentences, and the encoding layer of the trained sequence-to-sequence model is used to generate sentence vectors, which can effectively improve the accuracy of sentence vector generation on the basis of improving the difficulty of model training. It ensures the integrity of the semantic information and grammatical information of the generated sentence vectors, thereby effectively avoiding the existing construction method based on the average word vector, destroying the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction.
- the training difficulty of the model is low, the transfer ability of the model in actual tasks is insufficient, and the accuracy of the generated sentence vectors is low.
- Figure 1 shows a schematic flowchart of a sentence vector generation method provided by an embodiment of the present application
- Figure 2 shows a schematic flowchart of another sentence vector generation method provided by an embodiment of the present application
- Figure 3 shows a schematic diagram of the initial sequence-to-sequence model architecture provided by the embodiment of the present application
- Figure 4 shows a schematic structural diagram of a sentence vector generation device provided by an embodiment of the present application
- Figure 5 shows a schematic structural diagram of another sentence vector generation device provided by an embodiment of the present application.
- AI Artificial Intelligence
- AI is the theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
- Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- this embodiment provides a sentence vector generation method, as shown in Figure 1.
- This method is explained by taking the method applied to computer equipment such as servers as an example.
- the server can be an independent server or a cloud-provided server. Services, cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (CDN: Content Delivery Network), and big data and artificial intelligence platforms and other basics Cloud server for cloud computing services.
- the above method includes the following steps:
- Step 101 Perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.
- the book recommendation scenario it is suitable for recommending other similar books based on the obtained book text content.
- a book recommendation request is received, according to the book title in the book recommendation request, obtain and For the book text content corresponding to the book title, the book text content is segmented based on Chinese punctuation, and through text segmentation, multiple sentence texts are obtained for input into the sentence vector generation model.
- the book text content can be book abstract text, book introduction text, etc., which are not specifically limited here.
- Step 102 Use a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text.
- the sentence vector generation model is the encoding of a trained sequence-to-sequence model. layer; wherein, the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model, the context sentences in the constructed sentence sample set are encoded and context decoded on the current sentence in the sequence, and the result is The upper predicted sentence and the lower predicted sentence of the current sentence are used; according to the upper predicted sentence and the lower predicted sentence, a trained sequence-to-sequence model is obtained.
- the initial sequence-to-sequence model is trained based on the constructed sentence sample set including the context sentence pair sequence, where the context sentence pair sequence includes the current sentence and the context sentence corresponding to the current sentence, and the current sentence is input into the initial sequence to the sequence
- the coding layer of the model performs encoding processing to obtain a vector representation containing the context feature information of the current sentence.
- the vector representation containing the context feature information of the current sentence is input into the initial sequence into the two decoding layers set up in parallel in the sequence model.
- the current sentence is obtained through decoding processing.
- the coding layer of the trained sequence-to-sequence model has the coding ability to accurately predict the current sentence context and can retain the integrity of the semantic information and grammatical information of the current sentence context. Therefore, the vector representation output on this basis can contain the current sentence. Complete contextual feature information to ensure the accuracy of subsequent book recommendations.
- the context sentence pair sequence constructed based on the current sentence and its context sentences is used as the input data of the initial sequence-to-sequence model, which can retain the interdependence and mutual influence between words without destroying the overall structure of the text data, thereby ensuring
- the model can learn the complete semantic information and grammatical information contained in the sentence text, improving the accuracy of the model in extracting contextual sentence features.
- the obtained initial sentence text can be semantically segmented according to the above solution to obtain the segmented sentence text, and a pre-built sentence vector generation model can be used to predict the context of the sentence text through coding processing.
- the sentence vector generation model is the encoding layer of the trained sequence-to-sequence model; wherein the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model , perform coding processing and context decoding processing on the context sentences in the constructed sentence sample set to the current sentence in the sequence, and obtain the upper prediction sentence and the lower prediction sentence of the current sentence; according to the upper prediction sentence and the lower prediction sentence, we get Trained sequence-to-sequence model.
- this embodiment uses context sentences to perform sequence-to-sequence model training on sequences, and uses the coding layer of the trained sequence-to-sequence model
- the generated sentence vector of the sentence text can ensure the integrity of the semantic information and grammatical information of the sentence text, thereby effectively improving the accuracy of sentence vector generation.
- Step 201 Use the initial sequence-to-sequence model to perform encoding and context decoding processing on the current sentence in the sequence of the context sentences in the constructed sentence sample set to obtain the upper predicted sentence and the lower predicted sentence of the current sentence.
- the context sentence pair sequence specifically includes: the current sentence used to be input to the coding layer of the initial sequence to sequence model for context sentence prediction; and the above sentence used to train the output result of the initial sequence to sequence model.
- the target sentence and the target sentence below, the output result is the predicted sentence above and the predicted sentence below output during the model training process.
- step 201 may specifically include: using a word segmentation tool to perform word segmentation processing according to the sequence of context sentence pairs to obtain a sequence of context sentence pairs after word segmentation;
- the context sentence of the current sentence in the sequence is used to obtain the sentence embedding vector of the current sentence using the encoding layer of the initial sequence to sequence model; according to the sentence embedding vector of the current sentence, the initial sequence to sequence model is used
- Two decoding layers are set up in parallel to obtain the upper prediction sentence and the lower prediction sentence respectively; wherein, the two decoding layers refer to the first decoding layer used to predict the upper part, and the second decoding layer used to predict the lower part. layer.
- the first decoding layer used to predict the upper part is a first GRU model
- the second decoding layer used to predict the lower part is a second GRU.
- the step of obtaining the above predicted sentence and the below predicted sentence respectively according to the sentence embedding vector of the current sentence, using the two decoding layers set in parallel in the initial sequence to sequence model specifically includes:
- the sentence embedding vector of the current sentence is used as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and the above predicted sentence of the current sentence is obtained through decoding processing;
- the sentence embedding vector of the current sentence is used as the input data of the first GRU model.
- the input data of the two GRU models are decoded to obtain the following predicted sentences of the current sentence.
- the step of using the initial sequence-to-sequence model to obtain the above predicted sentence and the following predicted sentence of the current sentence based on the current sentence in the context sentence pair sequence it also includes: constructing a sentence sample set, and the sentence sample set includes the context. Sentence pair sequence. Specific steps include:
- the sequence of context sentence pairs is expressed as (S 1 ,S 2 ,S 3 ), (S 2 ,S 3 ,S 4 ), (S 3 ,S 4 ,S 5 ), (S i-1 ,S i , S i+1 ),..., (S n-2 ,S n-1 ,S n ), where S i represents the current sentence, S i-1 represents the above target sentence adjacent to S i , S i+1 Represents the following target sentence adjacent to Si .
- the encoding layer Encoder of the initial sequence-to-sequence model is used to output the sentence embedding vector sentence embedding h s of the current sentence, and the first decoding layer pre-Decoder for predicting the above sentence sequence and the first decoding layer for predicting the following sentence sequence are simultaneously input
- the second decoding layer next-Decoder uses the first decoding layer pre-Decoder and the second decoding layer next-Decoder to respectively obtain the upper prediction sentence of the current sentence and the lower prediction sentence of the current sentence. As shown in Figure 3, specific steps include:
- the initial sequence-to-sequence model includes one encoding layer and two decoding layers.
- the basic models of the encoding layer and the decoding layer are both gated recurrent units (GRU: Gate Recurrent Unit). ).
- the next-Encoder decodes the sentence embedding vector sentence embedding h s synchronously, and obtains the upper prediction sentence of the current sentence and the lower prediction sentence corresponding to the current sentence. Specifically include:
- the sentence embedding vector sentence embedding h s as the input of the first decoding layer pre-Decoder (above decoding), and obtain the above predicted sentence Y i-1 corresponding to the current sentence through decoding processing.
- the sentence embedding vector sentence embedding h s of the current sentence Si is used to predict the above predicted sentence Y i-1 corresponding to the current sentence. Since upward prediction does not conform to the characteristics of natural language, the training difficulty of the first decoding layer is greater than that of the second
- the decoding layer next-Decoder improves the GRU model architecture to improve the accuracy of the above prediction while ensuring training efficiency and preventing gradient disappearance.
- the GRU model at each moment can combine the sentence embedding vector sentence embedding h s of the current sentence S i .
- the specific formula is as follows:
- z t represents the update gate of the GRU model
- W z , U z are the update gate parameters of the original GRU model
- x t represents the input vector at the current time t
- h t-1 represents the previous moment, that is, the input vector at time t-1
- V z represents the parameters set for the sentence embedding vector sentence embedding h s .
- the reset gate r t and candidate memory units of the GRU model They all incorporate sentence embedding h s , W r , U r , V r represents the parameters of the reset gate, tanh represents the activation function, W k , U k , V k represents the parameters of the candidate memory unit, h t represents the current moment t Output vector, ⁇ represents the fully connected layer with activation function, ⁇ represents the multiplication operation of the corresponding elements of the vector.
- next-Encoder through decoding processing, obtains the following predicted sentence Y i+1 corresponding to the current sentence. Among them, predicting the following sentences based on the current sentence is in line with the top-down characteristics of natural language. Therefore, the second decoding layer next-Encoder uses the existing GRU model, and the sentence embedding vector sentence embedding h s is only used as the initial vector of the second decoding layer.
- predicting the previous sentence of the current sentence based on the encoder-decoder model framework breaks the top-down rule of natural language, increases the difficulty of model training, and enables the model to be fully trained, thereby outputting complete semantic signals and
- the sentence vector representation of grammatical information furthermore, by improving the update gate, reset gate and candidate memory unit of the GRU model, can effectively ensure the training efficiency of the model while improving the difficulty of model training.
- Step 202 Use the target loss function to train the initial sequence-to-sequence model based on the upper predicted sentence and the lower predicted sentence of the current sentence to obtain a trained sequence-to-sequence model.
- the target loss function is determined based on the sum of the first loss function and the second loss function, and the first loss function in the target loss function is set based on the first decoding layer used to predict the above, The second loss function in the target loss function is set based on the second decoding layer used to predict the following.
- the target loss function is used to train the initialized sequence-to-sequence model. network parameters until the initialized sequence-to-sequence model converges, and the trained sequence-to-sequence model is obtained.
- the cross-entropy loss function is used as the basic loss function, and the specific formula is:
- CE represents the cross entropy loss function
- S represents the current sentence
- Y represents the predicted sentence generated by the decoding layer Decoder
- l represents the number of tokens determined after segmentation of the current sentence S
- t j represents the jth token obtained by segmenting the current sentence S.
- token y j represents the j-th token in the predicted sentence Y.
- the corresponding upper sentence loss function (first loss function) and the lower sentence are determined.
- Loss function (second loss function) and then obtain the target loss function of the initialized sequence-to-sequence model, that is, the sum of the above sentence loss function and the following sentence loss function.
- the initialized sequence-to-sequence model is trained until the target loss function value of the initialized sequence-to-sequence model is reached.
- the training ends and the trained sequence-to-sequence model is obtained.
- Step 203 Perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.
- Step 204 Use a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text.
- the sentence vector generation model is the encoding of a trained sequence-to-sequence model. layer.
- the coding layer of the trained sequence-to-sequence model is extracted as a sentence vector generation model, so that after receiving a book recommendation request, according to the book title in the book recommendation request, the introduction text corresponding to the book title is obtained, based on Chinese punctuation is used to segment the introduction text into sentences, and the Harbin Institute of Technology LTP model is used to perform word segmentation processing on the segmented introduction text to obtain the sentence text after word segmentation.
- the sentence vector generation model is then used to encode the sentence text to obtain the vector representation of the sentence text. .
- Step 205 Calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library, where the sentence embedding vector in the preset book sample library is generated using the sentence vector obtained from the model output.
- the sentence vector generation model is used to output the sentence embedding vector corresponding to the introduction text, thereby constructing a preset book sample library based on the output sentence embedding vector, and using cosine similarity
- This algorithm calculates the similarity value between the corresponding sentence vector output according to the book recommendation request and the sentence embedding vector corresponding to each book in the preset book sample library.
- Step 206 Generate book recommendation information for the sentence text based on the sentence embedding vectors whose similarity values meet the preset conditions in the preset book sample library.
- the book when the user browses a book on the platform, the book is used as the target book, and a book recommendation request containing the title of the target book is generated.
- the sentence vector generation model is used to generate the corresponding sentence vectors, and then calculate the similarity values between the generated sentence vectors and each set of sentence embedding vectors in the preset book sample library corresponding to the platform, and arrange them in descending order to embed sentences whose similarity values meet the preset conditions.
- the book information corresponding to the vector is recommended to the user as a similar book.
- the obtained initial sentence text is semantically segmented to obtain the segmented sentence text, and the pre-built sentence vector generation model is used to generate the coding process for predicting the context of the sentence text.
- the sentence vector generation model is the encoding layer of the trained sequence-to-sequence model; wherein the trained sequence-to-sequence model is obtained through the following steps: using the initial sequence-to-sequence model , perform coding processing and context decoding processing on the context sentences in the constructed sentence sample set to the current sentence in the sequence, and obtain the upper prediction sentence and the lower prediction sentence of the current sentence; according to the upper prediction sentence and the lower prediction sentence, we get Trained sequence-to-sequence model.
- performing sequence-to-sequence model training on sequences based on context sentences and using the encoding layer of the trained sequence-to-sequence model to generate sentence vectors can effectively improve the accuracy of sentence vector generation and ensure the generation of sentence vectors while improving the difficulty of model training.
- the integrity of the semantic information and grammatical information of the sentence vector thereby effectively avoiding the existing construction method based on the average word vector, destroying the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction, and based on contrastive learning
- the construction method the training difficulty of the model is low, the transfer ability of the model in actual tasks is insufficient, and the accuracy of the generated sentence vectors is low, which is a technical problem.
- the embodiment of the present application provides a sentence vector generation device, as shown in Figure 4.
- the device includes: a model training module 41, a preprocessing module 42, and an encoding module 43.
- the model training module 41 can be used to use the initial sequence-to-sequence model to perform encoding processing and context decoding processing on the current sentence in the sequence of the context sentences in the constructed sentence sample set to obtain the above predicted sentence and the following sentence of the current sentence. Predict sentences; based on the above predicted sentences and the following predicted sentences, the trained sequence-to-sequence model is obtained.
- the preprocessing module 42 can be used to perform semantic segmentation on the obtained initial sentence text to obtain segmented sentence text.
- the encoding module 43 may be used to utilize a pre-built sentence vector generation model to obtain a vector representation of the sentence text through encoding processing for predicting the context of the sentence text.
- the sentence vector generation model is a trained sequence to The encoding layer of the sequence model.
- a book recommendation module 44 is also included.
- the model training module 41 includes a training unit 411.
- the training unit 411 may be used to train the initial sequence-to-sequence model using a target loss function based on the above prediction sentence and the below prediction sentence of the current sentence, and obtain a trained sequence-to-sequence model; wherein, The target loss function is determined based on the sum of the first loss function and the second loss function.
- the context sentence pair sequence specifically includes: a current sentence used to be input to the encoding layer of the initial sequence to sequence model for context sentence prediction; and, used to train the initial sequence to sequence model.
- the upper target sentence and the lower target sentence of the output result are the upper prediction sentence and the lower prediction sentence output during the model training process.
- the model training module 41 can be used to perform word segmentation processing using a word segmentation tool according to the sequence of context sentence pairs to obtain a sequence of context sentence pairs after word segmentation.
- the sequence of context sentence pairs after word segmentation For the current sentence in the sequence, use the encoding layer of the initial sequence to the sequence model to obtain the sentence embedding vector of the current sentence.
- the sentence embedding vector of the current sentence use the initial sequence to the sequence set in parallel in the sequence model.
- the two decoding layers respectively obtain the upper prediction sentence and the lower prediction sentence, wherein the two decoding layers refer to the first decoding layer used to predict the upper part and the second decoding layer used to predict the lower part.
- the first decoding layer used to predict the upper part is a first GRU model
- the second decoding layer used to predict the lower part is a second GRU model
- the step of using the sentence embedding vector of the current sentence and using the two decoding layers set up in parallel in the initial sequence to sequence model to obtain the above predicted sentence and the following predicted sentence respectively specifically including:
- the sentence embedding vector of the current sentence is used as the input data of the reset gate, update gate and candidate memory unit in the first GRU model, and the above predicted sentence of the current sentence is obtained through decoding processing;
- the sentence embedding vector of the current sentence is As the input data of the second GRU model, the following predicted sentence of the current sentence is obtained through decoding processing.
- the first loss function in the target loss function is set based on the first decoding layer used to predict the above, and the second loss function in the target loss function is based on the first loss function used in prediction. Set by the second decoding layer below.
- the book recommendation module 44 includes a similarity calculation unit 441 and a generation unit 442.
- the similarity calculation unit 441 may be used to calculate the similarity value between the vector representation of the sentence text and the sentence embedding vector in the preset book sample library.
- the generation unit 442 may be configured to generate book recommendation information for the sentence text based on the sentence embedding vectors whose similarity values satisfy the preset conditions in the preset book sample library; wherein, the sentences in the preset book sample library The embedding vector is obtained using the sentence vector generation model output.
- Sentence vector generation methods including:
- the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
- the trained sequence-to-sequence model is obtained through the following steps:
- the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
- the trained sequence-to-sequence model is obtained.
- the steps of obtaining a trained sequence-to-sequence model based on the above predicted sentences and the following predicted sentences include:
- the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model
- the target loss function is determined based on the sum of the first loss function and the second loss function.
- context sentence pair sequence specifically includes:
- the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.
- the storage medium is a computer-readable storage medium, which may be non-volatile or volatile.
- the technical solution of the present application can be embodied in the form of a software product.
- the software product can be stored in a storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) and includes a number of instructions to enable
- a computer device which may be a personal computer, a server, or a network device, etc. executes the methods described in each implementation scenario of this application.
- embodiments of the present application also provide a computer device, which can be a personal computer, Server, network equipment, etc.
- the physical equipment includes a storage medium and a processor; the storage medium is used to store the computer program; the processor is used to execute the computer program to implement the above sentence vector generation method as shown in Figure 1 and Figure 2, include:
- the vector representation of the sentence text is obtained through encoding processing for predicting the context of the sentence text, and the sentence vector generation model is the coding layer of the trained sequence-to-sequence model;
- the trained sequence-to-sequence model is obtained through the following steps:
- the context sentences in the constructed sentence sample set are encoded and contextually decoded on the current sentence in the sequence to obtain the upper prediction sentence and the lower prediction sentence of the current sentence;
- the trained sequence-to-sequence model is obtained.
- the steps of obtaining a trained sequence-to-sequence model based on the above predicted sentences and the following predicted sentences include:
- the target loss function to train the initial sequence-to-sequence model to obtain a trained sequence-to-sequence model
- the target loss function is determined based on the sum of the first loss function and the second loss function.
- context sentence pair sequence specifically includes:
- the upper target sentence and the lower target sentence used to train the output result of the initial sequence to sequence model, and the output result is the upper prediction sentence and the lower prediction sentence output during the model training process.
- the computer device may also include a user interface, a network interface, a camera, a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, a WI-FI module, etc.
- the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc.
- the optional user interface may also include a USB interface, a card reader interface, etc.
- Optional network interfaces may include standard wired interfaces, wireless interfaces (such as Bluetooth interfaces, WI-FI interfaces), etc.
- a computer device does not constitute a limitation on the physical device, and may include more or less components, or combine certain components, or arrange different components.
- the storage medium may also include an operating system and a network communication module.
- An operating system is a program that manages the hardware and software resources of a computer device and supports the operation of information processing programs and other software and/or programs.
- the network communication module is used to implement communication between components within the storage medium, as well as communication with other hardware and software in the physical device.
- this embodiment uses context sentences to perform sequence-to-sequence model training on sequences, and utilizes well-trained
- the sentence vectors of sentence texts generated by the encoding layer of the sequence-to-sequence model can ensure the integrity of the semantic information and grammatical information of the sentence text, thereby effectively improving the accuracy of sentence vector generation, thereby effectively avoiding the existing problem based on the average value of word vectors.
- the construction method destroys the dependence between words in the sentence, resulting in low accuracy of sentence feature extraction.
- the construction method based on contrastive learning the training difficulty of the model is low, and the transfer ability of the model in actual tasks is insufficient.
- the accompanying drawing is only a schematic diagram of a preferred implementation scenario, and the modules or processes in the accompanying drawing are not necessarily necessary for implementing the present application.
- the modules in the devices in the implementation scenario can be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or can be correspondingly changed and located in one or more devices different from the implementation scenario.
- the modules of the above implementation scenarios can be combined into one module or further split into multiple sub-modules.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
Abstract
La présente demande divulgue un procédé et un appareil de génération de vecteur de phrase, un dispositif informatique et un support de stockage, qui se rapportent au domaine technique de l'intelligence artificielle et qui peuvent améliorer la précision de génération de vecteur de phrase. Le procédé consiste : à effectuer une segmentation sémantique sur un texte de phrase initial obtenu pour obtenir un texte de phrase segmenté ; et à utiliser un modèle de génération de vecteur de phrase préconstruit pour obtenir une représentation vectorielle du texte de phrase au moyen d'un traitement de codage utilisé pour prédire le contexte du texte de phrase, le modèle de génération de vecteur de phrase étant une couche de codage d'un modèle formé de séquence à séquence. La présente demande est appropriée pour une recommandation de livres sur la base de vecteurs de phrase de textes de livres.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210232057.9 | 2022-03-09 | ||
CN202210232057.9A CN114444471A (zh) | 2022-03-09 | 2022-03-09 | 句子向量生成方法、装置、计算机设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023168814A1 true WO2023168814A1 (fr) | 2023-09-14 |
Family
ID=81359057
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/089817 WO2023168814A1 (fr) | 2022-03-09 | 2022-04-28 | Procédé et appareil de génération de vecteur de phrase, dispositif informatique et support de stockage |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114444471A (fr) |
WO (1) | WO2023168814A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178082A (zh) * | 2019-12-05 | 2020-05-19 | 北京葡萄智学科技有限公司 | 一种句向量生成方法、装置及电子设备 |
US20200218780A1 (en) * | 2019-01-03 | 2020-07-09 | International Business Machines Corporation | Automated contextual dialog generation for cognitive conversation |
WO2020151688A1 (fr) * | 2019-01-24 | 2020-07-30 | 腾讯科技(深圳)有限公司 | Procédé et dispositif de codage, équipement et support de stockage |
CN111602128A (zh) * | 2017-10-27 | 2020-08-28 | 巴比伦合伙有限公司 | 计算机实现的确定方法和系统 |
CN112052329A (zh) * | 2020-09-02 | 2020-12-08 | 平安科技(深圳)有限公司 | 文本摘要生成方法、装置、计算机设备及可读存储介质 |
-
2022
- 2022-03-09 CN CN202210232057.9A patent/CN114444471A/zh not_active Withdrawn
- 2022-04-28 WO PCT/CN2022/089817 patent/WO2023168814A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111602128A (zh) * | 2017-10-27 | 2020-08-28 | 巴比伦合伙有限公司 | 计算机实现的确定方法和系统 |
US20200218780A1 (en) * | 2019-01-03 | 2020-07-09 | International Business Machines Corporation | Automated contextual dialog generation for cognitive conversation |
WO2020151688A1 (fr) * | 2019-01-24 | 2020-07-30 | 腾讯科技(深圳)有限公司 | Procédé et dispositif de codage, équipement et support de stockage |
CN111178082A (zh) * | 2019-12-05 | 2020-05-19 | 北京葡萄智学科技有限公司 | 一种句向量生成方法、装置及电子设备 |
CN112052329A (zh) * | 2020-09-02 | 2020-12-08 | 平安科技(深圳)有限公司 | 文本摘要生成方法、装置、计算机设备及可读存储介质 |
Non-Patent Citations (1)
Title |
---|
RYAN KIROS, YUKUN ZHU, RUSLAN SALAKHUTDINOV, RICHARD S ZEMEL, ANTONIO TORRALBA, RAQUEL URTASUN, SANJA FIDLER: "Skip-Thought Vectors", 22 June 2015 (2015-06-22), XP055428189, Retrieved from the Internet <URL:https://arxiv.org/pdf/1506.06726.pdf> * |
Also Published As
Publication number | Publication date |
---|---|
CN114444471A (zh) | 2022-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444340B (zh) | 文本分类方法、装置、设备及存储介质 | |
WO2022007823A1 (fr) | Procédé et dispositif de traitement de données de texte | |
WO2022022421A1 (fr) | Système de modèle de représentation de langage, procédé et appareil de pré-apprentissage, dispositif et support | |
CN113051356B (zh) | 开放关系抽取方法、装置、电子设备及存储介质 | |
CN111967266A (zh) | 中文命名实体识别模型及其构建方法和应用 | |
CN112131366A (zh) | 训练文本分类模型及文本分类的方法、装置及存储介质 | |
CN110163181B (zh) | 手语识别方法及装置 | |
WO2020244475A1 (fr) | Procédé et appareil d'étiquetage de séquence de langue, support de stockage et dispositif informatique | |
CN111159485B (zh) | 尾实体链接方法、装置、服务器及存储介质 | |
US11487971B2 (en) | Multi-dimensional language style transfer | |
CN113421551B (zh) | 语音识别方法、装置、计算机可读介质及电子设备 | |
CN111145914B (zh) | 一种确定肺癌临床病种库文本实体的方法及装置 | |
WO2024199423A1 (fr) | Procédé de traitement de données et dispositif associé | |
WO2023116572A1 (fr) | Procédé de génération de mots ou de phrases et dispositif associé | |
CN116432646A (zh) | 预训练语言模型的训练方法、实体信息识别方法及装置 | |
CN116050425A (zh) | 建立预训练语言模型的方法、文本预测方法及装置 | |
WO2022228127A1 (fr) | Procédé et appareil de traitement de texte d'élément, dispositif électronique et support de stockage | |
JP2023002690A (ja) | セマンティックス認識方法、装置、電子機器及び記憶媒体 | |
CN118013045B (zh) | 基于人工智能的语句情感检测方法及装置 | |
CN114492661B (zh) | 文本数据分类方法和装置、计算机设备、存储介质 | |
CN115408488A (zh) | 用于小说场景文本的分割方法及系统 | |
CN116956816A (zh) | 文本处理方法、模型训练方法、装置及电子设备 | |
CN116258147A (zh) | 一种基于异构图卷积的多模态评论情感分析方法及系统 | |
CN117708568B (zh) | 大语言模型的特征提取方法、装置、计算机设备及介质 | |
CN115114407A (zh) | 意图识别方法、装置、计算机设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22930443 Country of ref document: EP Kind code of ref document: A1 |