CN116108128A - Open domain question-answering system and answer prediction method - Google Patents

Open domain question-answering system and answer prediction method Download PDF

Info

Publication number
CN116108128A
CN116108128A CN202310389053.6A CN202310389053A CN116108128A CN 116108128 A CN116108128 A CN 116108128A CN 202310389053 A CN202310389053 A CN 202310389053A CN 116108128 A CN116108128 A CN 116108128A
Authority
CN
China
Prior art keywords
paragraph
question
vectors
binary code
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310389053.6A
Other languages
Chinese (zh)
Other versions
CN116108128B (en
Inventor
张准
苏俊杰
马琼雄
王一辰
黄俊鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN202310389053.6A priority Critical patent/CN116108128B/en
Publication of CN116108128A publication Critical patent/CN116108128A/en
Application granted granted Critical
Publication of CN116108128B publication Critical patent/CN116108128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/328Management therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a novel open domain question-answering system which comprises a vector converter, a retriever, a paragraph index library, a supporting document indexer and an answer generator, wherein the retriever comprises a feature extraction module, a question linear layer, a question hash layer, a paragraph linear layer and a paragraph hash layer. The paragraphs of the knowledge base documents are processed by the retriever to obtain paragraph binary codes stored in the paragraph index library, the questions are processed by the retriever to obtain question continuous vectors and question binary codes, the support documents are screened out from the paragraph index library by the support document indexer, and after the question vectors and the support document vectors are spliced, predictive answers are generated by the answer generator. The question-answering system can efficiently search the support paragraph with the largest correlation degree with the problem, compresses the continuous vector output by the linear layer into a binary code by adopting the hash layer, and reduces the storage space of the index memory; the two-stage paragraph indexing method of binary code and continuous vector is adopted to effectively reduce the time consumption of retrieval.

Description

Open domain question-answering system and answer prediction method
Technical Field
The invention relates to the technical field of question-answering systems, in particular to an open domain question-answering system and an answer prediction method.
Background
In 2017, the facebook company designs a DrQA open domain question-answering system, and proposes a two-section type retriever-reader framework, namely, in order to answer various questions, documents relevant to the questions need to be retrieved in a large-scale network resource, the semantics of the documents are understood through a reader, and finally, the answers are extracted, so that the two-section type retriever-reader framework becomes a dominant paradigm of the open domain question-answering system.
The initial DrQA system and the retrievers of the other most systems employed sparse word matching classical "Information Retrieval (IR) systems" and "elastic search" provided a convenient way to index documents, so nearest neighbor searches could be performed by using the BM25 similarity function (word-dependent TF-IDF weighting). This word matching-based approach has obvious limitations, such as not considering synonyms and grammatical variations, and does not allow for a better understanding of the semantics of the entire sentence.
The latest retriever construction scheme mainly comprises dense vector representation and a deep learning model, and the dense vector representation and the deep learning model have some problems when obtaining good effects, for example, the dense vector representation needs to calculate large-scale paragraphs or articles, so that the index memory occupies large space and the retrieval is slow; the supervised retriever pre-training method has high requirements on the data set, and specific paragraphs and answers of the document resources need to be marked by the data set, so that the marking labor cost is huge. The reader construction scheme mainly adopts a deep learning model, and the principle is that answers are intercepted in the document paragraphs which are already searched from the LSTM model to the Bert model, but when the answers cannot be intercepted from the document simply, the reader model based on extraction is invalid.
Disclosure of Invention
Based on the above, the invention aims to provide a novel answer prediction method for open domain questions and answers.
An answer prediction method of open domain questions and answers, comprising the following steps:
S-1A converts an input problem into a word vector, adds paragraph features and position information to the word vector, and obtains a sentence matrix fusing the word vector, the paragraph features and the position information;
S-2A performs semantic feature extraction on an input sentence matrix to obtain a semantic feature matrix, and divides the semantic feature matrix into problem feature matrices according to sentence head vectors of the semantic feature matrix;
S-3A carries out linear transformation on the problem feature matrix to obtain a problem continuous vector;
S-4A performs binary code conversion on the problem continuous vector to obtain a problem binary code;
s-5, sequentially screening K paragraphs with the largest inner product value from a paragraph index library through the problem binary code and the problem continuous vector to obtain a problem support document; wherein, the paragraph index library stores paragraph binary codes of documents of the prior knowledge base;
s-6, converting the input questions into question vectors, converting the question support documents into question support document vectors, splicing the question vectors with the question support document vectors to obtain question document splicing vectors, and generating predictive answers for the question document splicing vectors by adopting a generate function and a greedy decoding algorithm.
Further, the step S-5 specifically comprises the following steps:
s-51, calculating the hamming distance between the problem binary code and the fall binary code in a fall index library, and screening out m paragraphs with the closest hamming distance to the problem binary code;
s-52, carrying out inner product operation on the problem continuous vector and the m paragraphs, screening out K paragraphs with the largest inner product value as supporting documents of the problem, and obtaining the supporting documents of the problem.
The invention also provides an open domain question-answering system, which comprises:
the vector converter is used for converting the input problem into a word vector, adding paragraph features and position information to the word vector, and obtaining a sentence matrix fusing the word vector, the paragraph features and the position information;
the retriever comprises a feature extraction module, a problem linear layer and a problem hash layer, wherein the feature extraction module is used for extracting semantic features of an input sentence matrix to obtain a semantic feature matrix, and dividing the semantic feature matrix into problem feature matrices according to sentence head vectors of the semantic feature matrix; the problem linear layer is used for carrying out linear transformation on the problem feature matrix to obtain a problem continuous vector; the problem hash layer is used for performing binary code conversion on the problem continuous vector to obtain a problem binary code;
a paragraph index library storing paragraph binary codes of documents of an existing knowledge base;
the supporting document indexer is used for screening K paragraphs with the largest inner product value from a paragraph index library through the problem binary codes and the problem continuous vectors in sequence to serve as supporting documents of the problems, and obtaining problem supporting documents;
the answer generator is used for converting the input questions into question vectors, converting the question support documents into question support document vectors, splicing the question vectors with the question support document vectors to obtain question document splicing vectors, and generating predictive answers for the question document splicing vectors by adopting a generate function and a greedy decoding algorithm.
Further, the retriever further comprises:
the paragraph linear layer is used for carrying out linear transformation on the paragraph characteristic matrix to obtain paragraph continuous vectors;
the paragraph hash layer is used for performing binary code conversion on the paragraph continuous vector to obtain a paragraph binary code;
the paragraph binary codes of the documents of the knowledge base stored in the paragraph index library are as follows: the method comprises the steps of intercepting a document obtained from a large-scale knowledge base in the vertical field into paragraphs with fixed word numbers, and sequentially processing the paragraphs by a vector converter, a feature extraction module, a paragraph linear layer and a paragraph hash layer.
Compared with the prior art, the question-answering system and the answer prediction method adopt the characteristic extraction model of the rear wiring layer and the hash layer as the retriever, can efficiently retrieve the support section with high correlation with the problem, and reduce the model parameters of the retriever; compressing continuous vectors output by the linear layer into binary codes by adopting a hash layer, and reducing the storage space of an index memory; the method for searching paragraphs by two stages of binary code indexing of the first problem and continuous vector indexing of the second problem effectively reduces the time consumption of searching, and the searched supporting paragraphs and the problems reach the maximum relativity, so that answers with smaller confusion degree are generated.
For a better understanding and implementation, the present invention is described in detail below with reference to the drawings.
Drawings
FIG. 1 is a schematic block diagram of an open domain question-answering system of the present invention;
FIG. 2 is a flowchart of the operation of the open domain question-answering system of FIG. 1;
FIG. 3 is a schematic diagram of a training method of the retriever of FIG. 1;
fig. 4 is a flowchart of answer prediction in the open domain question-answering system of the present invention.
Detailed Description
Firstly, converting paragraph binary codes of documents of an existing knowledge base through a retriever and storing the converted paragraph binary codes in a paragraph index base; then, the input questions are converted into question continuous vectors and question binary codes through the retriever; calculating the hamming distance between the problem binary code and the paragraph binary code in the paragraph index library in a support document indexer, screening out m paragraphs with the closest hamming distance, calculating the inner products of the problem continuous vector and the m paragraphs, and screening out K paragraphs with the largest inner product value from the m paragraphs as support documents; and finally, splicing the question vector and the support document vector through an answer generator, and generating a predicted answer by adopting a greedy decoding algorithm.
The open domain question-answering system of the present invention, including the construction, optimization and operation of the system, will be described in detail below.
Open domain question-answering system
Referring to fig. 1 and 2, fig. 1 is a schematic block diagram of an open domain question-answering system according to the present invention, and fig. 2 is a flowchart of the open domain question-answering system shown in fig. 1. The open domain question-answering system of the present invention includes a vector translator, a retriever, a paragraph index library, a supporting document indexer, and an answer generator.
Specifically, the vector converter is configured to perform step S-1: and converting the input sentence into a word vector, and adding paragraph features and position information to the word vector to obtain a sentence matrix integrating the word vector, the paragraph features and the position information. When the input sentence is a question, this step is labeled S-1A; when the sentence entered is an answer or paragraph, this step is labeled S-1B.
The retriever comprises a feature extraction module, a problem linear layer, a problem hash layer, a paragraph linear layer and a paragraph hash layer.
The feature extraction module is used for executing the step S-2: extracting semantic features of an input sentence matrix to obtain a semantic feature matrix, and dividing the semantic feature matrix into a problem feature matrix or a paragraph feature matrix according to sentence head vectors of the semantic feature matrix. When the semantic feature matrix is divided into problem feature matrices, the step is marked as S-2A; when the semantic feature matrix is divided into paragraph feature matrices, this step is labeled S-2B.
The problem linear layer is used for executing the step S-3A: and carrying out linear transformation on the problem feature matrix to obtain a problem continuous vector, and storing the problem continuous vector.
The problem hash layer is used for executing the step S-4A: and performing binary code conversion on the problem continuous vector to obtain a problem binary code, and storing the problem binary code.
The paragraph linear layer is used for executing the step S-3B: and linearly transforming the paragraph feature matrix to obtain paragraph continuous vectors and storing the paragraph continuous vectors.
The paragraph Ha Xiceng is for performing step S-4B: and performing binary code conversion on the paragraph continuous vector to obtain paragraph binary codes, and storing the paragraph binary codes.
When the input sentence is a problem, the feature extraction module, the problem linear layer and the problem hash layer form a problem encoder; when the input sentence is an answer or a paragraph, the feature extraction module, the paragraph linear layer, and the paragraph hash layer constitute a paragraph encoder. In this embodiment, the feature extraction module uses a Bert model or an Albert model that is followed by a [ CLS ] output. Further, the output dimension of the problem linear layer and the paragraph linear layer is 128 dimensions; and the problem hash layer and the paragraph hash layer adopt a tanh function, and the expression is as follows:
Figure SMS_1
in the formula ,
Figure SMS_2
indicating a manual adjustment value->
Figure SMS_3
Representing the continuous vector of question linearity layer and answer/linearity layer outputs.
The paragraph index stores paragraph binary codes of knowledge base documents. Specifically, a document is firstly obtained from a large-scale knowledge base in the vertical field, the document is intercepted into paragraphs with fixed word numbers, the paragraphs are input into a paragraph encoder of the retriever to execute steps S-2, S-3B and S-4B, and all paragraph binary codes are obtained
Figure SMS_4
And storing the mapping relation between the paragraph index library and the paragraph document according to the line.
The supporting document indexer is configured to perform step S-5: and sequentially using the problem binary codes and the problem continuous vectors to screen K paragraphs with the largest inner product value from the paragraph index library as a problem supporting document. Specifically, the maximum inner product search function of the fasss library is adopted, and the questions are calculated firstQuestion binary code and all paragraph binary codes in paragraph index library
Figure SMS_5
Screening m paragraphs closest to the Hamming distance of the binary code of the problem
Figure SMS_6
The method comprises the steps of carrying out a first treatment on the surface of the And then carrying out inner product operation on the m paragraphs and the problem continuous vector, and screening out K paragraphs with the largest inner product value from the m paragraphs to form a problem supporting document. Further, the format of the supporting document composed of K paragraphs is "">
Figure SMS_7
Paragraph 1->
Figure SMS_8
Paragraph 2 … ->
Figure SMS_9
Paragraph K ", wherein->
Figure SMS_10
To separate symbols.
The answer generator is configured to perform step S-6: and converting the input problem into a problem vector, splicing the problem vector with the problem support document vector to obtain a problem document splicing vector, and generating a prediction answer for the problem document splicing vector by adopting a generate function and a greedy decoding algorithm. In this embodiment, the answer generator is a BART model followed by a linear layer with a language header. Further, the splice format of the problem document splice vector is "problem: { } support documents: {}".
Optimization of (II) open domain question-answering system
After the open domain question-answering system is built, the question-answering system needs to be trained so as to be evolved from an initial system to an optimization system. The retriever is trained first, and the answer generator is trained and/or fine-tuned based on the trained retriever to obtain the optimized open domain question-answer system.
Meanwhile, the paragraph binary code data stored in the paragraph index library is also obtained based on a trained retriever. Specifically, the paragraphs of the knowledge base document are input into a trained retriever, and the paragraph binary codes are formed after being processed by a paragraph encoder of the retriever.
The training method of the open domain question-answering system will be specifically described below.
1. Training retriever
1.1 Creating a first data set for training a retriever
The search training dataset needs to be built before training the retriever.
Question-answer pair data is obtained from the open domain, and all question-answer pairs are made into a (question, answer) mapping dataset. Specifically, the data content includes question Q and answer A, and defines question answer pairs
Figure SMS_11
For positive sample, ++>
Figure SMS_12
Is a negative example.
Further, the data set batch size is 1024, as shown in Table 1,
Figure SMS_13
is a positive sample, others->
Figure SMS_14
,......,/>
Figure SMS_15
Is a negative example.
TABLE 1A batch of question answer pairs
Figure SMS_16
1.2 Training
Please refer to fig. 3, which is a diagram illustrating a training method of the retriever. The training process of the retriever is specifically as follows:
S-1C converts a certain batch of questions and answers in the input first data set into word vectors respectively, and adds paragraph features and position information to the word vectors to obtain sentence matrixes integrating the word vectors, the paragraph features and the position information.
S-2, extracting semantic features of the input sentence matrix to obtain a semantic feature matrix, and dividing the semantic feature matrix into a question feature matrix or an answer feature matrix according to sentence head vectors of the semantic feature matrix.
S-3A performs linear transformation on the problem feature matrix to obtain a problem continuous vector
Figure SMS_17
And stores the question continuation vector +.>
Figure SMS_18
S-4A is continuous to the problem vector
Figure SMS_19
Performing binary code conversion to obtain problematic binary code +.>
Figure SMS_20
And stores the question binary code +.>
Figure SMS_21
S-3B performing linear transformation on the answer characteristic matrix to obtain paragraph continuous vectors
Figure SMS_22
The paragraph continuous vector->
Figure SMS_23
Includes paragraph positive sample continuous vector->
Figure SMS_24
And question-independent paragraph negative sample continuous vector +.>
Figure SMS_25
Store paragraph continuous vector +.>
Figure SMS_26
Paragraph positive sample continuous vector->
Figure SMS_27
And paragraph negative sample continuous vector +.>
Figure SMS_28
S-4B performs binary code conversion on the paragraph continuous vector to obtain paragraph binary codes
Figure SMS_29
The paragraph binary code->
Figure SMS_30
Includes paragraph positive sample binary code->
Figure SMS_31
And paragraph negative sample binary code->
Figure SMS_32
Store paragraph binary codes
Figure SMS_33
Paragraph positive sample binary code->
Figure SMS_34
And paragraph negative sample binary code->
Figure SMS_35
SA-5 will input problem continuous vector
Figure SMS_36
Question binary code->
Figure SMS_37
Paragraph positive sample continuous vector->
Figure SMS_38
Paragraph negative sample continuous vector->
Figure SMS_39
Paragraph positive sample binary code->
Figure SMS_40
And paragraph negative sample binary code->
Figure SMS_41
And calculating according to the task loss function set for the retriever through a forward propagation algorithm to obtain the loss value of each task, and calculating the final loss value through the loss value of each task.
Specifically, 4 task loss functions are set for the retriever, and the 4 task loss functions are negative log likelihood functions of the positive sample of the minimized paragraph, and the expression of each loss function is as follows:
Figure SMS_42
(1) In the (4) mode,
Figure SMS_43
successive vectors representing the current question +.>
Figure SMS_44
Binary code representing the current question->
Figure SMS_45
Representing a positive sample continuation vector of a paragraph associated with the current question,/->
Figure SMS_46
Representing a positive sample binary code of a paragraph associated with the current problem,
Figure SMS_47
paragraph negative sample continuous vector representing irrelevant to the current question, +.>
Figure SMS_48
Representing a paragraph negative sample binary code that is not related to the current question.
Final loss value
Figure SMS_49
The loss values calculated by the formulas (1) - (4) are weighted to meet the following relation: />
Figure SMS_50
(5) in the formula ,
Figure SMS_51
a weighting factor of 4 loss values.
SA-6 updates parameters of the feature extraction module, the problem linear layer and the paragraph linear layer through a back propagation algorithm according to the final loss value.
Repeating the steps S-1C to SA-6 until the loss value tends to be stable or reaches the iteration number or is smaller than the iteration threshold value, obtaining a trained retriever, and storing the retriever.
In the present embodiment, the super-parameters of the retriever training are set to learning rate learning_rate=2×e -4 Maximum input max_length=256, and the optimizer is Adam optimizer.
2. Training answer generator
2.1 Creating a second data set for training the answer generator
Based on the trained retriever, an answer training dataset needs to be built before training the answer generator, specifically by the following way.
(1) And acquiring a corpus knowledge base in the vertical field, and intercepting sentences with fixed length from the corpus knowledge base by using a fixed probability as a question of an answer generator training data set.
(2) Inputting the questions into a question encoder of a trained retriever to obtain a question continuous vector
Figure SMS_52
And question binary code->
Figure SMS_53
The method comprises the steps of carrying out a first treatment on the surface of the Continuous vector of questions/>
Figure SMS_54
And question binary code->
Figure SMS_55
Input to supporting document indexer by calculating problem binary code +.>
Figure SMS_56
Obtaining m paragraphs from the hamming distance of the paragraph binary codes in the paragraph index library by calculating the problem continuous vector +.>
Figure SMS_57
And screening K paragraphs with the largest inner product value from the m paragraphs to form a question support document, taking the question support document as an answer, and taking the answer as a second label.
Repeating the steps (1) and (2) to obtain second labels corresponding to all the problems, and manufacturing a data set of the problems and the mapping of the second labels.
2.2 Training
SC-1 converts a batch of questions of the second data set and questions in the mapping of the second label into vectors, resulting in a question vector.
And the SC-2 performs feature extraction on the input question vector, and generates a second prediction answer by adopting a generate function and a greedy decoding algorithm.
And SC-3 calculates the cross entropy between the second predicted answer and the second label, takes the cross entropy as a second loss value, and updates the parameters of an answer generator through a back propagation algorithm, specifically updates the parameters of a BART model and a linear layer with a language head.
Repeating the steps SC-1 to SC-3 until the loss value tends to be stable or reaches the iteration number or less than the iteration threshold value, obtaining a trained answer generator, and storing the answer generator.
In this embodiment, the BART model of the answer generator is initialized by using the BART-large-Chinese weight; the trained super-parameters were set to learn rate learning_rate=2×e -4 Maximum input max_length=1024, and the optimizer is Adam optimizer.
3. Fine tuning answer generator
3.1 Creating a third data set for fine tuning the answer generator
Based on the trained answer generator, a fine tuning data set needs to be established before fine tuning the answer generator, specifically by the following method.
(1) And obtaining question-answer pair data in the vertical field.
(2) Inputting the questions into a question encoder of a trained retriever to obtain a question continuous vector
Figure SMS_58
And question binary code->
Figure SMS_59
The method comprises the steps of carrying out a first treatment on the surface of the Continuous vector of questions->
Figure SMS_60
And question binary code->
Figure SMS_61
Input to supporting document indexer by calculating problem binary code +.>
Figure SMS_62
Obtaining m paragraphs from the Hamming distance between the paragraph index library and the paragraph binary code library in the paragraph index library, and calculating the problem continuous vector +.>
Figure SMS_63
And (3) filtering out K paragraphs with the largest inner product value from the m paragraphs with the inner product values of the m paragraphs to form a problem supporting document.
Repeating the steps (1) and (2) to obtain a question support document of all questions, and making a data set of the mapping of the questions, the question support document and a third label, wherein the third label is an answer paragraph in the vertical field answer pair data.
3.2 Fine tuning
SD-1 converts a certain batch of questions of the third data set, questions and question support documents in the mapping of the question support documents and the third labels into vectors to obtain question vectors and support document vectors, and performs splicing processing on the question vectors and the support document vectors to obtain question document splicing vectors.
And SD-2 performs feature extraction on the input question document splicing vector, and generates a third prediction answer by adopting a generate function and a greedy decoding algorithm.
And SD-3 calculates the cross entropy between the third predicted answer and the third label, takes the cross entropy as a third loss value, updates the parameters of the trained answer generator through a back propagation algorithm, and particularly updates the parameters of the BART model and the linear layer with the language head.
Repeating the steps SD-1 to SD-3 until the loss value tends to be stable or reaches the iteration number or is smaller than the iteration threshold value, obtaining a finely tuned answer generator, and storing the answer generator.
In the present embodiment, the fine-tuned super-parameters are set to the learning rate learning_rate=2×e -5 Maximum input max_length=1024, and the optimizer is Adam optimizer.
The initial question-answering system constructed by the invention is optimized through the steps, so that the optimized question-answering system is obtained.
(III) operation of question-answering System
Referring to fig. 4, the present invention generates a predicted answer to a question by running the following steps on an optimized open domain question-answering system.
S-1A converts the input problem into a word vector, and adds paragraph features and position information to the word vector to obtain a sentence matrix integrating the word vector, the paragraph features and the position information.
S-2A performs semantic feature extraction on an input sentence matrix to obtain a semantic feature matrix, and divides the semantic feature matrix into problem feature matrices according to sentence head vectors of the semantic feature matrix.
S-3A carries out linear transformation on the problem feature matrix to obtain a problem continuous vector.
S-4A performs binary code conversion on the problem continuous vector to obtain a problem binary code.
S-5, sequentially screening K paragraphs with the largest inner product value from the paragraph index library through the question binary code and the question continuous vector to serve as supporting documents of the questions, and obtaining the question supporting documents.
S-6, converting the input questions into question vectors, converting the question support documents into question support document vectors, splicing the question vectors with the question support document vectors to obtain question document splicing vectors, and generating predictive answers for the question document splicing vectors by adopting a generate function and a greedy decoding algorithm.
According to the invention, only a single feature extraction model is adopted in the feature extraction module, and the retriever formed by the structures of the problem linear layer-problem hash layer and the paragraph linear layer-paragraph Ha Xiceng is combined at the classification output end of the feature extraction model, so that model parameters of the retriever can be greatly reduced, continuous vectors are compressed into binary codes, the storage space of an index memory is reduced, and the continuous vectors and the binary codes are used for calculating a loss function of the model, so that the model can be effectively trained; the method for searching the supported documents by using the continuous vectors and the binary codes to screen paragraphs through the field vector searching tool in the paragraph index library and then further screening the paragraphs through the continuous vectors can search the supported documents with the largest correlation degree with the questions, and reduces the time consumption of searching high-quality answers.
Meanwhile, the unsupervised training method of the answer generator of the open domain question-answering system provided by the invention enables the output predictive label to be closer to the style of the answer, and simultaneously, a knowledge-enhanced encoder is used, namely, the questions and the supporting documents are spliced together and input into the encoder, and compared with the method for inputting the questions only, the method can output the answers with smaller confusion.
Based on the same inventive concept, the present application also provides an electronic device, which may be a terminal device such as a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet computer, a netbook, etc.). The device comprises one or more processors and a memory, wherein the processors are used for executing a program to realize an answer prediction method of open domain questions and answers; the memory is used for storing a computer program executable by the processor.
Based on the same inventive concept, the present application further provides a computer readable storage medium, corresponding to the foregoing embodiment of the answer prediction method of open domain questions and answers, having stored thereon a computer program, which when executed by a processor, implements the steps of the aspect word emotion analysis method described in any of the foregoing embodiments.
The present application may take the form of a computer program product embodied on one or more storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention, and the invention is intended to encompass such modifications and improvements.

Claims (10)

1. An answer prediction method for open domain questions and answers is characterized by comprising the following steps:
S-1A converts an input problem into a word vector, adds paragraph features and position information to the word vector, and obtains a sentence matrix fusing the word vector, the paragraph features and the position information;
S-2A performs semantic feature extraction on an input sentence matrix to obtain a semantic feature matrix, and divides the semantic feature matrix into problem feature matrices according to sentence head vectors of the semantic feature matrix;
S-3A carries out linear transformation on the problem feature matrix to obtain a problem continuous vector;
S-4A performs binary code conversion on the problem continuous vector to obtain a problem binary code;
s-5, sequentially screening K paragraphs with the largest inner product value from a paragraph index library through the problem binary code and the problem continuous vector to obtain a problem support document; wherein, the paragraph index library stores paragraph binary codes of documents of the prior knowledge base;
s-6, converting the input questions into question vectors, converting the question support documents into question support document vectors, splicing the question vectors with the question support document vectors to obtain question document splicing vectors, and generating predictive answers for the question document splicing vectors by adopting a generate function and a greedy decoding algorithm.
2. The answer prediction method of open domain questions and answers according to claim 1, wherein the paragraph binary codes of the documents of the knowledge base stored in the paragraph index library are obtained by:
s-0: acquiring a document from a large-scale knowledge base in the vertical field, and intercepting the document into paragraphs with fixed word numbers;
S-1B: converting the input paragraph into a word vector, and adding paragraph features and position information to the word vector to obtain a sentence matrix integrating the word vector, the paragraph features and the position information;
S-2B: extracting semantic features of the sentence matrix to obtain a semantic feature matrix, and dividing the semantic feature matrix into paragraph feature matrices according to sentence head vectors of the semantic feature matrix;
S-3B: performing linear transformation on the paragraph feature matrix to obtain paragraph continuous vectors;
S-4B: and performing binary code conversion on the paragraph continuous vector to obtain paragraph binary codes.
3. The answer prediction method of open domain questions and answers as claimed in claim 2, wherein said step S-5 comprises the steps of:
s-51, calculating the hamming distance between the problem binary code and the fall binary code in a fall index library, and screening out m paragraphs with the closest hamming distance to the problem binary code;
s-52, carrying out inner product operation on the problem continuous vector and the m paragraphs, screening out K paragraphs with the largest inner product value as supporting documents of the problem, and obtaining the supporting documents of the problem.
4. The answer prediction method of open domain questions and answers according to any one of claims 1-3, wherein the parameters of semantic feature extraction in the step S-2A and/or the step S-2B and the parameters of linear transformation in the step S-3A and/or the step S-3B are obtained by:
S-1C, respectively converting a certain batch of input questions and answers into word vectors, and adding paragraph features and position information to the word vectors to obtain sentence matrixes integrating the word vectors, the paragraph features and the position information;
s-2, extracting semantic features of an input sentence matrix to obtain a semantic feature matrix, and dividing the semantic feature matrix into a problem feature matrix or a paragraph feature matrix according to sentence head vectors of the semantic feature matrix;
S-3A carries out linear transformation on the problem feature matrix to obtain a problem continuous vector;
S-3B carries out linear transformation on the paragraph feature matrix to obtain paragraph continuous vectors, wherein the paragraph continuous vectors comprise paragraph positive sample continuous vectors and paragraph negative sample continuous vectors;
S-4A performs binary code conversion on the problem continuous vector to obtain a problem binary code;
S-4B performs binary code conversion on the paragraph continuous vector to obtain paragraph binary codes, wherein the paragraph binary codes comprise paragraph positive sample binary codes and paragraph negative sample binary codes;
SA-5 carries out matrix operation on the input problem continuous vector, the paragraph positive sample continuous vector, the paragraph negative sample continuous vector, the problem binary code, the paragraph positive sample binary code and the paragraph negative sample binary code through forward propagation according to 4 task loss functions set for a retriever to obtain loss values of 4 tasks, and calculates a final loss value through the loss values of the 4 tasks;
the 4 task loss functions are all negative log likelihood functions of the positive samples of the minimized paragraphs, and each loss function satisfies the following relation:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
successive vectors representing the current question +.>
Figure QLYQS_3
Binary code representing the current question->
Figure QLYQS_4
Representing a positive sample continuation vector of a paragraph associated with the current question,/->
Figure QLYQS_5
Representing a paragraph positive sample binary code associated with the current question, -/->
Figure QLYQS_6
Paragraph negative sample continuous vector representing irrelevant to the current question, +.>
Figure QLYQS_7
A paragraph negative sample binary code representing a paragraph not related to the current question;
the final loss value satisfies the following relation:
Figure QLYQS_8
wherein ,
Figure QLYQS_9
weight coefficients for 4 loss values;
SA-6 updates parameters of feature extraction and linear transformation through a back propagation algorithm according to the final loss value;
repeating the steps S-1C to SA-6 until the loss value tends to be stable or reaches the iteration number or is smaller than the iteration threshold.
5. The answer prediction method of open domain questions and answers as claimed in claim 4, wherein the parameters for generating the predicted answers in the step S-6 are obtained by:
SC-1 converts a certain batch of input questions and questions in the mapping of the second label into vectors to obtain question vectors;
SC-2 performs feature extraction on the input question vector, and generates a second prediction answer by adopting a generate function and a greedy decoding algorithm;
SC-3 calculates the cross entropy between the second predicted answer and the second label, takes the cross entropy as a second loss value, and updates the parameters of an answer generator through a back propagation algorithm;
repeating the steps SC-1 to SC-3 until the loss value tends to be stable or reaches the iteration number or is smaller than an iteration threshold;
wherein the mapping of the question and the second label is obtained by:
SE-1 acquires a corpus knowledge base in the vertical field, and intercepts sentences with fixed length in the corpus knowledge base as problems and problems in the mapping of a second label by using a fixed probability random;
SE-2 executes steps S-3A and S-4A on the problem to obtain a problem continuous vector and a problem binary code;
step S-5 is executed on the problem continuous vector and the problem binary code by SE-3 to obtain a problem supporting document, and the problem supporting document is used as a second label;
and repeating SE-1 to SE-3 to obtain second labels corresponding to all the problems, and manufacturing a data set of the problems and the mapping of the second labels.
6. An open domain question-answering system, comprising:
the vector converter is used for converting the input problem into a word vector, adding paragraph features and position information to the word vector, and obtaining a sentence matrix fusing the word vector, the paragraph features and the position information;
the retriever comprises a feature extraction module, a problem linear layer and a problem hash layer, wherein the feature extraction module is used for extracting semantic features of an input sentence matrix to obtain a semantic feature matrix, and dividing the semantic feature matrix into problem feature matrices according to sentence head vectors of the semantic feature matrix; the problem linear layer is used for carrying out linear transformation on the problem feature matrix to obtain a problem continuous vector; the problem hash layer is used for performing binary code conversion on the problem continuous vector to obtain a problem binary code;
a paragraph index library storing paragraph binary codes of documents of an existing knowledge base;
the supporting document indexer is used for screening K paragraphs with the largest inner product value from a paragraph index library through the problem binary codes and the problem continuous vectors in sequence to serve as supporting documents of the problems, and obtaining problem supporting documents;
the answer generator is used for converting the input questions into question vectors, converting the question support documents into question support document vectors, splicing the question vectors with the question support document vectors to obtain question document splicing vectors, and generating predictive answers for the question document splicing vectors by adopting a generate function and a greedy decoding algorithm.
7. The open-domain question-answering system according to claim 6, wherein the retriever further comprises:
the paragraph linear layer is used for carrying out linear transformation on the paragraph characteristic matrix to obtain paragraph continuous vectors;
the paragraph hash layer is used for performing binary code conversion on the paragraph continuous vector to obtain a paragraph binary code;
the paragraph binary codes of the documents of the knowledge base stored in the paragraph index library are as follows: the method comprises the steps of intercepting a document obtained from a large-scale knowledge base in the vertical field into paragraphs with fixed word numbers, and sequentially processing the paragraphs by a vector converter, a feature extraction module, a paragraph linear layer and a paragraph hash layer.
8. The open domain question-answering system according to claim 6 or 7, wherein the supporting document indexer is a fasss library for calculating hamming distances between the problem binary code and the segment binary codes in the segment index library, and screening out m segments nearest to the problem binary code hamming distance; and performing inner product operation on the problem continuous vector and the m paragraphs, screening out K paragraphs with the largest inner product value as supporting documents of the problem, and obtaining the problem supporting documents.
9. The open domain question-answering system according to claim 8, wherein the parameters of the retriever are obtained by:
S-1C, respectively converting a certain batch of input questions and answers into word vectors, and adding paragraph features and position information to the word vectors to obtain sentence matrixes integrating the word vectors, the paragraph features and the position information;
s-2, extracting semantic features of an input sentence matrix to obtain a semantic feature matrix, and dividing the semantic feature matrix into a problem feature matrix or a paragraph feature matrix according to sentence head vectors of the semantic feature matrix;
S-3A carries out linear transformation on the problem feature matrix to obtain a problem continuous vector;
S-3B carries out linear transformation on the paragraph feature matrix to obtain paragraph continuous vectors, wherein the paragraph continuous vectors comprise paragraph positive sample continuous vectors and paragraph negative sample continuous vectors;
S-4A performs binary code conversion on the problem continuous vector to obtain a problem binary code;
S-4B performs binary code conversion on the paragraph continuous vector to obtain paragraph binary codes, wherein the paragraph binary codes comprise paragraph positive sample binary codes and paragraph negative sample binary codes;
SA-5 carries out matrix operation on the input problem continuous vector, the paragraph positive sample continuous vector, the paragraph negative sample continuous vector, the problem binary code, the paragraph positive sample binary code and the paragraph negative sample binary code through forward propagation according to 4 task loss functions set for a retriever to obtain loss values of 4 tasks, and calculates a final loss value through the loss values of the 4 tasks;
the 4 task loss functions are all negative log likelihood functions of the positive samples of the minimized paragraphs, and each loss function satisfies the following relation:
Figure QLYQS_10
wherein ,
Figure QLYQS_11
successive vectors representing the current question +.>
Figure QLYQS_12
Binary code representing the current question->
Figure QLYQS_13
Representing a positive sample continuation vector of a paragraph associated with the current question,/->
Figure QLYQS_14
Representing a paragraph positive sample binary code associated with the current question, -/->
Figure QLYQS_15
Paragraph negative sample continuous vector representing irrelevant to the current question, +.>
Figure QLYQS_16
A paragraph negative sample binary code representing a paragraph not related to the current question;
the final loss value satisfies the following relation:
Figure QLYQS_17
wherein ,
Figure QLYQS_18
weight coefficients for 4 loss values;
SA-6 updates parameters of feature extraction and linear transformation through a back propagation algorithm according to the final loss value;
repeating the steps S-1C to SA-6 until the loss value tends to be stable or reaches the iteration number or is smaller than the iteration threshold.
10. The open domain question-answering system according to claim 9, wherein the parameters of the answer generator are obtained by:
SC-1 converts a certain batch of input questions and questions in the mapping of the second label into vectors to obtain question vectors;
SC-2 performs feature extraction on the input question vector, and generates a second prediction answer by adopting a generate function and a greedy decoding algorithm;
SC-3 calculates the cross entropy between the second predicted answer and the second label, takes the cross entropy as a second loss value, and updates the parameters of an answer generator through a back propagation algorithm;
repeating the steps SC-1 to SC-3 until the loss value tends to be stable or reaches the iteration number or is smaller than an iteration threshold;
wherein the mapping of the question and the second label is obtained by:
SE-1 acquires a corpus knowledge base in the vertical field, and intercepts sentences with fixed length in the corpus knowledge base as problems and problems in the mapping of a second label by using a fixed probability random;
SE-2 executes steps S-3A and S-4A on the problem to obtain a problem continuous vector and a problem binary code;
step S-5 is executed on the problem continuous vector and the problem binary code by SE-3 to obtain a problem supporting document, and the problem supporting document is used as a second label;
and repeating SE-1 to SE-3 to obtain second labels corresponding to all the problems, and manufacturing a data set of the problems and the mapping of the second labels.
CN202310389053.6A 2023-04-13 2023-04-13 Open domain question-answering system and answer prediction method Active CN116108128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310389053.6A CN116108128B (en) 2023-04-13 2023-04-13 Open domain question-answering system and answer prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310389053.6A CN116108128B (en) 2023-04-13 2023-04-13 Open domain question-answering system and answer prediction method

Publications (2)

Publication Number Publication Date
CN116108128A true CN116108128A (en) 2023-05-12
CN116108128B CN116108128B (en) 2023-09-05

Family

ID=86260157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310389053.6A Active CN116108128B (en) 2023-04-13 2023-04-13 Open domain question-answering system and answer prediction method

Country Status (1)

Country Link
CN (1) CN116108128B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975206A (en) * 2023-09-25 2023-10-31 华云天下(南京)科技有限公司 Vertical field training method and device based on AIGC large model and electronic equipment
CN117312506A (en) * 2023-09-07 2023-12-29 广州风腾网络科技有限公司 Page semantic information extraction method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885672A (en) * 2019-03-04 2019-06-14 中国科学院软件研究所 A kind of question and answer mode intelligent retrieval system and method towards online education
CN110879838A (en) * 2019-10-29 2020-03-13 中科能效(北京)科技有限公司 Open domain question-answering system
CN111159340A (en) * 2019-12-24 2020-05-15 重庆兆光科技股份有限公司 Answer matching method and system for machine reading understanding based on random optimization prediction
CN111325103A (en) * 2020-01-21 2020-06-23 华南师范大学 Cell labeling system and method
CN111966810A (en) * 2020-09-02 2020-11-20 中国矿业大学(北京) Question-answer pair ordering method for question-answer system
US20210216576A1 (en) * 2020-01-14 2021-07-15 RELX Inc. Systems and methods for providing answers to a query
CN113868379A (en) * 2021-10-09 2021-12-31 中国科学院声学研究所 Paragraph selection method, device, equipment and storage medium for open domain question answering
CN114416914A (en) * 2022-03-30 2022-04-29 中建电子商务有限责任公司 Processing method based on picture question and answer
US20220309949A1 (en) * 2020-04-24 2022-09-29 Samsung Electronics Co., Ltd. Device and method for providing interactive audience simulation
CN115408987A (en) * 2022-06-15 2022-11-29 北京理工大学 Long text reading understanding method based on sentence-level document segmentation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885672A (en) * 2019-03-04 2019-06-14 中国科学院软件研究所 A kind of question and answer mode intelligent retrieval system and method towards online education
CN110879838A (en) * 2019-10-29 2020-03-13 中科能效(北京)科技有限公司 Open domain question-answering system
CN111159340A (en) * 2019-12-24 2020-05-15 重庆兆光科技股份有限公司 Answer matching method and system for machine reading understanding based on random optimization prediction
US20210216576A1 (en) * 2020-01-14 2021-07-15 RELX Inc. Systems and methods for providing answers to a query
CN111325103A (en) * 2020-01-21 2020-06-23 华南师范大学 Cell labeling system and method
US20220309949A1 (en) * 2020-04-24 2022-09-29 Samsung Electronics Co., Ltd. Device and method for providing interactive audience simulation
CN111966810A (en) * 2020-09-02 2020-11-20 中国矿业大学(北京) Question-answer pair ordering method for question-answer system
CN113868379A (en) * 2021-10-09 2021-12-31 中国科学院声学研究所 Paragraph selection method, device, equipment and storage medium for open domain question answering
CN114416914A (en) * 2022-03-30 2022-04-29 中建电子商务有限责任公司 Processing method based on picture question and answer
CN115408987A (en) * 2022-06-15 2022-11-29 北京理工大学 Long text reading understanding method based on sentence-level document segmentation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
余传明;王曼怡;林虹君;朱星宇;黄婷婷;安璐;: "基于深度学习的词汇表示模型对比研究", 数据分析与知识发现, no. 08 *
刘葛泓;李金泽;李卞婷;邵南青;窦万峰;: "基于Text-CNN联合分类与匹配的合同法律智能问答系统研究", 软件工程, no. 06 *
孙浩等: "《基于多层注意力机制的跨模态检索方法研究》", CNKI *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312506A (en) * 2023-09-07 2023-12-29 广州风腾网络科技有限公司 Page semantic information extraction method and system
CN117312506B (en) * 2023-09-07 2024-03-08 广州风腾网络科技有限公司 Page semantic information extraction method and system
CN116975206A (en) * 2023-09-25 2023-10-31 华云天下(南京)科技有限公司 Vertical field training method and device based on AIGC large model and electronic equipment
CN116975206B (en) * 2023-09-25 2023-12-08 华云天下(南京)科技有限公司 Vertical field training method and device based on AIGC large model and electronic equipment

Also Published As

Publication number Publication date
CN116108128B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
Guu et al. Retrieval augmented language model pre-training
CN116108128B (en) Open domain question-answering system and answer prediction method
US11544474B2 (en) Generation of text from structured data
CN109271514B (en) Generation method, classification method, device and storage medium of short text classification model
CN114329109B (en) Multimodal retrieval method and system based on weakly supervised Hash learning
CN112364624B (en) Keyword extraction method based on deep learning language model fusion semantic features
US11120214B2 (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN114861889B (en) Deep learning model training method, target object detection method and device
CN116775847A (en) Question answering method and system based on knowledge graph and large language model
CN113704667B (en) Automatic extraction processing method and device for bid announcement
CN110795541A (en) Text query method and device, electronic equipment and computer readable storage medium
CN116303977B (en) Question-answering method and system based on feature classification
CN114298157A (en) Short text sentiment classification method, medium and system based on public sentiment big data analysis
CN113656700A (en) Hash retrieval method based on multi-similarity consistent matrix decomposition
CN116932730B (en) Document question-answering method and related equipment based on multi-way tree and large-scale language model
CN113128431B (en) Video clip retrieval method, device, medium and electronic equipment
CN113934869A (en) Database construction method, multimedia file retrieval method and device
CN113159187A (en) Classification model training method and device, and target text determining method and device
CN112487263A (en) Information processing method, system, equipment and computer readable storage medium
CN111782810A (en) Text abstract generation method based on theme enhancement
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium
CN114692610A (en) Keyword determination method and device
CN114398489A (en) Entity relation joint extraction method, medium and system based on Transformer
CN113641790A (en) Cross-modal retrieval model based on distinguishing representation depth hash
Li et al. Similarity search algorithm over data supply chain based on key points

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant