CN115470332A - Intelligent question-answering system for content matching based on matching degree - Google Patents
Intelligent question-answering system for content matching based on matching degree Download PDFInfo
- Publication number
- CN115470332A CN115470332A CN202211074234.1A CN202211074234A CN115470332A CN 115470332 A CN115470332 A CN 115470332A CN 202211074234 A CN202211074234 A CN 202211074234A CN 115470332 A CN115470332 A CN 115470332A
- Authority
- CN
- China
- Prior art keywords
- answer
- matching degree
- candidate
- query content
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 36
- 230000006870 function Effects 0.000 claims description 49
- 239000012634 fragment Substances 0.000 claims description 29
- 238000009499 grossing Methods 0.000 claims description 22
- 238000012512 characterization method Methods 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000001914 filtration Methods 0.000 description 7
- 238000007781 pre-processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 235000012976 tarts Nutrition 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an intelligent question-answering system for content matching based on matching degree, and a method and a device for content matching based on matching degree, wherein the method comprises the following steps: acquiring query content subjected to format processing; determining the matching degree of the query content subjected to format processing and the candidate paragraphs of each text paragraph, and determining the text paragraphs with the matching degree greater than a first matching degree threshold value as the candidate paragraphs; selecting an answer segment associated with the query content subjected to format processing in each candidate paragraph, and determining the matching degree of the query content subjected to format processing and the answer segment of each answer segment; determining the matching degree of the query content subjected to format processing and the answer segment based on the matching degree of the candidate paragraphs and the matching degree of the answer segment; and selecting at least one target sub-paragraph associated with the format-processed query content from the plurality of answer segments based on the matching degree of the format-processed query content and the answer segments.
Description
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to an intelligent question-answering system for content matching based on matching degree, and a method and a device for content matching based on matching degree.
Background
The question-answering system based on the knowledge graph technology requires that the special knowledge of the target field is expressed in a knowledge graph mode, and meanwhile, unstructured question content of a user is converted into a graph query statement in a structured form. The common technology comprises two modes of semantic analysis and path retrieval, wherein the semantic analysis is carried out on the problem of a user in the former mode, and the problem is directly converted into a query statement of a map, so that an answer is obtained through query; the latter is more beneficial to processing complex problems, can provide a search path of the problem in a multi-hop mode, and has strong interpretability. However, constructing a knowledge-graph of the expertise of a particular target area is not itself a simple matter, and thus the prerequisites for prior art solutions are relatively harsh and difficult to meet.
The question-answer pair detection technology firstly needs to arrange all the special knowledge in a specific target field into a question-answer pair form and store the question-answer pair form in advance as a question-answer pair library. And then, the answers to the questions asked by the user are carried out in a mode of matching the questions of the user with the questions in the question-answer pair library, and the answers in the question-answer pairs obtained after matching are returned. The method is simple and direct, but the quality of the question and answer depends on the question and answer pairs stored in advance, and the establishment of the question and answer pair library in the early stage can be a very expensive project.
Accordingly, there is a need in the art for an intelligent question and answer system.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an intelligent question-answering system based on a reordering reading understanding algorithm, which can intelligently process various types of documents of a target system.
The invention relates to a dialogue system aiming at various types of rules and other special knowledge and the answer space is relatively closed, which is different from a chat type and an instruction type dialogue system. The question-answering system with the knowledge query characteristics mainly comprises a knowledge-based map technology, a question-answering pair detection technology, a document question-answering technology and the like.
The technical scheme provided by the invention is different from the main technology in the prior art, and the technology related to the technical scheme provided by the invention mainly comprises problem natural language understanding and knowledge matching technology. The system firstly obtains a reordering system based on multiple documents through training, wherein, in the first step, the multiple documents are divided into paragraphs, the pre-trained BERT network is used for coding the paragraphs and typical answers, a specific loss function is adopted for training the BERT network, text matching is carried out on the document paragraphs and typical questions, a threshold value is set, and the paragraphs and the question pairs with low matching degree are filtered to form candidate paragraphs and question matching pairs; and secondly, designing another pretrained BERT network to encode the candidate paragraphs and the question matching pairs, and training start-stop position information of accurate answer segments contained in the predicted paragraphs of the network by adopting another loss function based on cross entropy, namely predicting answers of accurate matching questions from characters of the matched paragraphs. The training process is completed in advance in an off-line manner.
The trained system ranks the alternative answers to the user question in an online mode, the ranking criterion comprehensively considers the results of the two steps, namely the matching degree of the user question and each alternative paragraph and the matching degree of the user question and each alternative answer, the latter is multiplied by the former after logarithmic smoothing, ranks all the alternative answers according to the result, and returns the first N answers in the ranking.
According to an aspect of the present invention, there is provided a method for content matching based on matching degree, the method including:
acquiring original query content input by a user, and performing format processing on the original query content to acquire the query content subjected to format processing;
determining the matching degree of the query content subjected to format processing and the candidate paragraphs of each text paragraph in a plurality of text paragraphs in a text content library, and determining the text paragraphs with the matching degree of the candidate paragraphs being greater than a first matching degree threshold value as the candidate paragraphs;
selecting an answer segment associated with the format-processed query content in each candidate paragraph, and determining the matching degree of the format-processed query content and the answer segment of each answer segment;
determining the matching degree of the query content subjected to format processing and the answer segment based on the matching degree of the candidate paragraphs and the matching degree of the answer segment; and
selecting at least one target sub-paragraph associated with the formatted query content from a plurality of answer segments based on a degree of matching of the formatted query content with an answer segment.
Preferably, the formatting the original query content to obtain formatted query content includes:
acquiring a content processing rule for performing format processing on original query content;
and performing format processing on the original query content based on a content processing rule to obtain the query content subjected to format processing.
Preferably, before obtaining the original query content input by the user,
segmenting each document in the plurality of documents in the text content library according to a natural segment to obtain a plurality of natural segments;
a plurality of levels of headings in each document are determined, and each level of headings and at least one natural segment associated with the headings are formed into a text paragraph.
Preferably, the method also comprises the following steps of,
determining the number of characters in each text paragraph;
determining the text paragraphs with the number of characters larger than the character number threshold value as text paragraphs to be processed;
and segmenting the text paragraphs to be processed until the number of characters of any text paragraphs obtained through segmentation is smaller than or equal to a character number threshold value.
Preferably, the determining the matching degree of the format-processed query content and the candidate paragraphs of each text paragraph in the plurality of text paragraphs in the text content library includes:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature codes u of the format-processed query content query q :
u q =Bert 1 (query)
Language characterization model Bert pre-trained using Bert 1 Determining each text passage p j By semantic feature coding
Calculating the matching degree of the query content after format processing and the candidate paragraph of the jth text paragraph in a plurality of text paragraphs in the text content library
Wherein 0< -j is less than or equal to na, j is a natural number, and na is the number of text paragraphs in the text content library.
Preferably, when determining that the formatted query content matches a candidate paragraph of each of a plurality of text paragraphs in the content library, determining a text paragraph with a candidate paragraph matching degree greater than a first matching degree threshold as a candidate paragraph, the following loss function is involved:
wherein, λ is a hyper-parameter, Ω - A set of documents that are irrelevant to the query content query after format processing; omega + Is a collection of documents related to the formatted query content query.
Preferably, after determining the text paragraphs for which the candidate paragraphs have a degree of match greater than the first degree of match threshold as candidate paragraphs, the candidate paragraphs are formed into a set of candidate paragraphs:
preferably, the selecting an answer segment associated with the formatted query content in each candidate passage comprises:
language characterization model Bert pre-trained using Bert 2 Determining semantic feature encodings u for answer fragments associated with the format-processed query content qj :
u qj =Bert 2 (concat(query,p j ))
Determining the starting position I of the answer segment in the candidate paragraph start And an end position I end :
Wherein,is a weight matrix of the starting position,to weight matrix with end position, softmax is the activation function, P start As starting position probability, P end To end position probability, len (p) j ) Is p j The character length of (2);
based on the starting position I start And an end position I end At each candidate paragraph p j To select an answer segment associated with the formatted query content.
Preferably, in selecting the answer segment associated with the formatted query content in each candidate passage, the following loss function is involved:
L=αCE(P start ,Label start )+βCE(P end ,Label end )+γCE(P span ,Label span )
where CE represents the cross entropy loss function, label start For the starting position of the standard answer Label, label end Label as the end position of the Standard answer Label span An answer segment representing a standard answer label from a start position to an end position; α, β, γ are hyper-parameters.
Preferably, determining the matching degree of the query content subjected to format processing and the answer segment of each answer segment comprises:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature coding of answer fragments for a jth candidate paragraph
Determining the formatted query content u q Matching degree of answer segment with j-th answer segment
Wherein, a j Is the answer fragment of the jth candidate paragraph.
Preferably, the determining the matching degree of the query content subjected to format processing and the answer segment based on the matching degree of the candidate paragraphs and the matching degree of the answer segment includes:
matching degree of the answer segmentPerforming logarithmic smoothing to obtain the matching degree after smoothing
Based on candidate paragraph matching degreeAnd degree of matching by smoothingDetermining the matching degree s of the query content subjected to format processing and the answer fragment:
where f is a log smoothing function.
Preferably, selecting at least one target sub-paragraph associated with the formatted query content from a plurality of answer segments based on a degree of matching of the formatted query content with an answer segment comprises:
sorting the answer fragments according to the descending order of the matching degree of the query contents and the answer fragments after format processing so as to generate a sorted list;
acquiring a preset extraction parameter N, and selecting N answer fragments with the maximum matching degree from the sorted list;
and determining at least one answer segment with the matching degree larger than a second matching degree threshold value in the N answer segments with the maximum matching degree as a target subsection.
According to another aspect of the present invention, there is provided an apparatus for content matching based on a degree of matching, the apparatus including:
the processing unit is used for acquiring original query content input by a user and carrying out format processing on the original query content to acquire the query content subjected to format processing;
a first determining unit, configured to determine a matching degree between the query content subjected to format processing and a candidate paragraph of each text paragraph in a plurality of text paragraphs in a text content library, and determine a text paragraph with the matching degree greater than a first matching degree threshold as a candidate paragraph;
a second determining unit, configured to select an answer segment associated with the query content subjected to format processing in each candidate paragraph, and determine a matching degree of the query content subjected to format processing and the answer segment of each answer segment;
a third determining unit, configured to determine, based on the candidate paragraph matching degree and the answer segment matching degree, a matching degree between the query content subjected to format processing and the answer segment; and
and the selecting unit is used for selecting at least one target sub-paragraph associated with the query content subjected to format processing from a plurality of answer segments based on the matching degree of the query content subjected to format processing and the answer segments.
According to another aspect of the present invention, there is provided a computer-readable storage medium, wherein the storage medium stores a computer program for executing the method according to any of the above embodiments.
According to another aspect of the present invention, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize the method of any one of the above embodiments.
According to yet another aspect of the embodiments of the present disclosure, there is provided a computer program product including computer readable code, when the computer readable code runs on a device, a processor in the device executes a method for implementing any of the embodiments.
The innovation of the invention mainly comes from two points: firstly, answer screening is carried out on a given user question by integrating two similarity calculations, wherein the first step is mainly to match a text, the second step is to extract answer fragments based on a reading understanding algorithm from a candidate paragraph, and the reordering simultaneously considers two aspects of text matching and reading understanding; secondly, a unique loss function is adopted to train the matching degree of the problem and the candidate paragraphs.
The main advantages of the present invention result from the two innovative points described above. The reordering method can comprehensively consider the text matching degree of the user question and the candidate paragraph and the similarity between the user question and the accurate answer in the candidate paragraph, and the accuracy and the stability of answer screening are improved; the loss function used in the training of the first-step problem and candidate paragraph matching network can ensure that the paragraphs relevant to the problem can be accurately selected.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a flow chart of a method for matching content based on matching degree according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method for understanding an algorithm based on multiple document re-ranking reads, according to an embodiment of the invention;
FIG. 3 is a model diagram of a multiple document re-ranking based reading understanding algorithm according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for matching content based on matching degree according to an embodiment of the present invention.
Detailed Description
Fig. 1 is a flowchart of a method for matching content based on matching degree according to an embodiment of the present invention. As shown in fig. 1, the method 100 includes:
In one embodiment, formatting original query content to obtain formatted query content includes: acquiring a content processing rule for format processing of original query content; and performing format processing on the original query content based on the content processing rule to obtain the query content subjected to format processing.
In one embodiment, before obtaining the original query content input by the user, the method further includes segmenting each document of the plurality of documents in the text content library according to the natural segment to obtain a plurality of natural segments; a plurality of levels of headings in each document are determined, and each level of headings and at least one natural segment associated with the headings are formed into a text paragraph.
In one embodiment, further comprising, determining a number of characters in each text passage; determining the text paragraphs with the number of characters larger than the character number threshold value as text paragraphs to be processed; and segmenting the text paragraphs to be processed until the number of characters of any text paragraphs obtained through segmentation is less than or equal to the character number threshold value.
In one embodiment, determining a degree of matching of the formatted query content to a candidate passage for each of a plurality of text passages within the textual content library includes:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature codes u of formatted query content query q :
u q =Bert 1 (query)
Language characterization model Bert pre-trained using Bert 1 Determining each text paragraph p j Semantic feature coding of
Calculating the matching degree of the query content subjected to format processing and the candidate paragraph of the jth text paragraph in a plurality of text paragraphs in the text content library
Wherein 0< -j is less than or equal to na, j is a natural number, and na is the number of text paragraphs in the text content library.
In one embodiment, when determining that the formatted query content matches a candidate paragraph for each of a plurality of text paragraphs within the corpus of text, determining as the candidate paragraph the text paragraph for which the candidate paragraph matches greater than the first threshold of matching, the following penalty function is involved:
wherein, lambda is a hyper-parameter, omega - The document set is irrelevant to the query content query after format processing; omega + Is a collection of documents related to the formatted query content query.
In one embodiment, after determining a text passage for which the candidate passage has a degree of match greater than a first degree of match threshold as a candidate passage, the candidate passage is formed into a set of candidate passages:
In one embodiment, selecting an answer segment associated with the formatted query content in each candidate passage comprises:
language characterization model Bert pre-trained using Bert 2 Determining semantic feature encodings u for answer fragments associated with formatted query content qj :
u qj =Bert 2 (concat(query,p j ))
Determining the starting position I of the answer segment in the candidate paragraph start And an end position I end :
Wherein,is a weight matrix of the starting position,to be a weight matrix with the end position, softmax is an activation function, P start As starting position probability, P end To end position probability, len (p) j ) Is p j The character length of (2);
based on the starting position I start And an end position I end At each candidate paragraph p j To select an answer segment associated with the formatted query content.
In one embodiment, in selecting an answer segment associated with the formatted query content in each candidate passage, the following loss function is involved:
L=αCE(Ps tart ,Label start )+βCE(P end ,Label end )+γCE(P span ,Label span )
where CE represents the cross entropy loss function, label start For the starting position of the standard answer Label, label end Label as the end position of the Standard answer Label span An answer segment representing a standard answer label from a start position to an end position; α, β, γ are hyper-parameters.
And step 104, determining the matching degree of the query content subjected to format processing and the answer segment based on the matching degree of the candidate paragraphs and the matching degree of the answer segment.
In one embodiment, determining the degree of matching between the formatted query content and the answer segment of each answer segment comprises:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature coding of answer fragments for a jth candidate paragraph
Wherein, a j Is the answer fragment of the jth candidate paragraph.
In one embodiment, determining the matching degree of the query content subjected to format processing and the answer segment based on the matching degree of the candidate paragraphs and the matching degree of the answer segment comprises:
degree of matching to answer segmentPerforming logarithmic smoothing to obtain the matching degree after smoothing
Based on candidate paragraph matching degreeAnd degree of matching after smoothingDetermining the matching degree s of the query content subjected to format processing and the answer fragment:
where f is a log smoothing function.
And 105, selecting at least one target sub-paragraph associated with the format-processed query content from the plurality of answer segments based on the matching degree of the format-processed query content and the answer segments.
In one embodiment, selecting at least one target sub-paragraph from the plurality of answer segments that is associated with the formatted query content based on a degree of matching of the formatted query content to the answer segments comprises:
sorting the answer fragments according to the descending order of the matching degree of the query contents and the answer fragments after format processing so as to generate a sorted list;
acquiring a preset extraction parameter N, and selecting N answer fragments with the maximum matching degree from the sorted list;
and determining at least one answer segment with the matching degree larger than a second matching degree threshold value in the N answer segments with the maximum matching degree as a target subsection.
According to an alternative, the method comprises: step 1011, obtaining the original query content input by the user, and performing format processing on the original query content to obtain the query content subjected to format processing.
At step 1012, the matching degree of the formatted query content and the text of each text paragraph in the plurality of text paragraphs in the text content library is determined, and the text paragraphs with the matching degree greater than the first threshold matching degree are determined as candidate paragraphs.
Step 1013, selecting a result sub-paragraph associated with the format-processed query content in each candidate paragraph, and determining a result matching degree of the format-processed query content and each result sub-paragraph.
And 1014, determining the matching degree of the query content subjected to format processing and the result subsection based on the text matching degree and the result matching degree.
Step 1015, selecting at least one target sub-paragraph from the plurality of result sub-paragraphs that is associated with the formatted query content based on the matching degree between the formatted query content and the result sub-paragraphs.
The format processing of the original query content to obtain the format-processed query content includes:
acquiring a content processing rule for format processing of original query content;
and performing format processing on the original query content based on the content processing rule to obtain the query content subjected to format processing.
Also included prior to obtaining the original query content entered by the user,
segmenting each document in a plurality of documents in a text content library according to a natural segment to obtain a plurality of natural segments;
a plurality of levels of headings in each document are determined, and each level of headings and at least one natural segment associated with the headings are formed into a text paragraph.
Further comprising, determining the number of characters in each text paragraph;
determining the text paragraphs with the number of characters larger than the character number threshold value as text paragraphs to be processed;
and segmenting the text paragraphs to be processed until the number of characters of any text paragraphs obtained through segmentation is less than or equal to the character number threshold value.
Determining a text matching degree between the query content subjected to format processing and each text passage in a plurality of text passages in the text content library comprises the following steps:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature codes u of format-processed query content query q :
u q =Bert 1 (query)
Language characterization model Bert pre-trained using Bert 1 Determining each text paragraph p j By semantic feature coding
Calculating the text matching degree of the query content subjected to format processing and the jth text paragraph in a plurality of text paragraphs in the text content library
Wherein 0< -j is less than or equal to na, j is a natural number, and na is the number of text paragraphs in the text content library.
When determining the text matching degree of the query content subjected to format processing and each text passage in a plurality of text passages in the text content library, and determining the text passage with the text matching degree larger than a first matching degree threshold value as a candidate passage, the following loss functions are involved:
wherein, λ is a hyper-parameter, Ω - The document set is irrelevant to the query content query after format processing; omega + Is a collection of documents that are relevant to the formatted query content query.
After determining the text paragraphs with the text matching degree larger than the first matching degree threshold value as candidate paragraphs, forming the candidate paragraphs into a candidate paragraph set:
selecting a result sub-paragraph associated with the formatted query content in each of the candidate paragraphs, comprising:
language characterization model Bert pre-trained using Bert 2 Determining semantic feature encodings u for result sub-paragraphs associated with formatted query content qj :
u qj =Bert 2 (concat(query,p j ))
Determining a starting position I where a result subsection falls within a candidate paragraph start And an end position I end :
Wherein,is a weight matrix of the starting position,to weight matrix with end position, softmax is the activation function, P start As starting position probability, P end To end position probability, len (p) j ) Is p j The character length of (d);
based on the starting position I start And an end position I end At each candidate paragraph p j Selects a result sub-paragraph associated with the formatted query content.
In selecting the result sub-paragraph associated with the formatted query content in each candidate paragraph, the following penalty function is involved:
L=αCE(Ps tart ,Label start )+βCE(P end ,Label end )+γCE(P span ,Label span )
where CE represents the cross entropy loss function, label start As the starting position of the standard answer Label, label end Label as the end position of the Standard answer Label span An answer segment representing a standard answer label from a start position to an end position; α, β, γ are hyper-parameters.
In one embodiment, filtering the candidate answers (e.g., filtering the candidate answers or answer snippets in the candidate passage) is also included. In particular, the starting position I at which the resulting sub-segment falls within the candidate paragraph is determined according to the above start And an end position I end The formula (a) determines the matching degree of the candidate answer (or determines the score, the matching value, the matching score and the like of the candidate answer), the matching degree (the score, the matching value, the matching score and the like) takes the average value of the probabilities of the starting position and the ending position, and the relevant candidate answer set meeting a specific threshold value t2 is filtered and reserved.
In one embodiment, the method further comprises the following steps of predicting the association probability of the query content query and the candidate answer: and after the answer segments of the candidate paragraphs are predicted, the association probability between the query content query and the candidate answer segments is predicted.
Determining semantic feature encodings (e.g., a language characterization model Bert pre-trained using Bert) 1 Determining global semantic feature coding and character level semantic feature coding):
[H cls ,H tokens ]=Bert 3 ([query,answer])
wherein H cls Encoding for Global semantic features (Global Embedding);
H tokens encoding semantic features at the character level (token-level Embedding);
the answer is a candidate answer which is taken from a candidate answer set;
the query is the query content.
In one embodiment, further comprising, performing decomposition of a feature hierarchy (a feature hierarchy of a model for predicting the probability of association of query content query with candidate answer): the features are hierarchically decomposed into an intent layer, a core entity layer, and a relationship layer. The core entity layer carries out Mask on the query content query and the character codes of the non-core entities in the candidate answer; the relation layer carries out Mask on character codes of core entities in the query content query and the candidate answer; the intent layer retains the full character encoding. On the basis, the three layers of character string codes are subjected to matrix transformation, and are respectively expressed as follows after being processed by an average pooling layer:
determining the probability distribution of the hierarchical features:
where a is the activation function and where a is the activation function,is a trainable parameter; y is u Labels that are probability distributions;
determining a global feature probability distribution:
whereinIs a trainable parameter; y is g The query is a label of global probability distribution, the query is query content, query sentences or query keywords, and the answer is a candidate answer;
determining a loss function (the loss function in predicting the probability of association of a query with a candidate answer) includes:
determining a global loss function:
L G =-logP(y g |query,answer)
determining a distribution difference loss function:
L D =F(P(y u |query,answer),P(y g |query,answer))+F(P(y g |query,answer),P(y u |query,answer))
Determining a loss function:
L=L G +λL D
wherein λ is a hyperparameter
Prediction of association probability:
reorder to obtain answers (answers matching or corresponding to query content query)For example, selecting topN answers from the candidate answers): according to the loss function as the matching degree of the query content query and the candidate answer (for example,) And reordering all answer candidate sets, and selecting the answer of topN (N is a natural number, so that the N answers with the largest matching degree are selected) as the final result.
In one embodiment, determining the result matching degree of the query content subjected to format processing and each result sub-paragraph comprises:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature encodings for a result sub-paragraph of a jth candidate paragraph
Determining format-processed query content u q Degree of matching with the result of the jth result sub-paragraph
Wherein, a j Is the result sub-paragraph of the jth candidate paragraph.
Determining the matching degree of the query content subjected to format processing and the result subsection based on the text matching degree and the result matching degree, wherein the step comprises the following steps:
degree of matching of resultsPerforming logarithmic smoothing to obtain the matching degree after the smoothing
Based on text matching degreeAnd degree of matching by smoothingDetermining the matching degree s of the query content subjected to format processing and the result sub-section:
where f is a log smoothing function.
Selecting at least one target sub-paragraph from the plurality of result sub-paragraphs that is associated with the formatted query content based on a degree of matching of the formatted query content to the result sub-paragraph, including:
sorting the result sub-paragraphs according to the descending order of the matching degree of the query contents and the result sub-paragraphs after format processing so as to generate a sorted list;
acquiring a preset extraction parameter N, and selecting N result subsections with the maximum matching degree from the sorted list;
and determining at least one result sub-paragraph of the N result sub-paragraphs with the maximum matching degree, wherein the matching degree of the at least one result sub-paragraph is greater than a second matching degree threshold value, as the target sub-paragraph.
FIG. 2 is a flow diagram of a method for understanding an algorithm based on multiple document re-ranking reads in accordance with an embodiment of the present invention.
Typically, document pre-processing is performed first. Firstly, the natural segments of all candidate documents are subjected to preliminary segmentation. Since the header contains clear service information and is highly related to the problem, the multi-level header and each section of content are separately connected together by special symbols, and if the obtained result does not exceed the preset maximum length, the result is taken as the result of preprocessing; otherwise, further cutting is carried out. And finally, obtaining a paragraph candidate set of the multiple documents:
then, the query is determined to match the candidate paragraph relevancy. As shown in FIG. 2, query and paragraphs _1, paragraph _2, \ 8230, and paragraph N are encoded at the semantic coding layer. For example, determining semantic feature encoding: semantic coding is respectively carried out on the query and the full candidate paragraphs, a Bert pre-trained language representation model is adopted, and the coding result is expressed as follows:
u q =Bert 1 (query)
determining matching degree prediction of query and candidate paragraphs:
determining a loss function:
wherein λ represents a hyper-parameter; omega _ Representing a set of candidate documents unrelated to the query; omega + Representing a set of candidate documents associated with the query.
And (3) filtering candidate paragraphs: according to the formula of step 2Score the candidate paragraphs (e.g., determine a match value, match score, etc. for the candidate paragraphs), and filter the set of relevant candidate paragraphs that remain satisfying threshold t1, represented as
Next, at the semantic matching and answer extraction layer, the answer segments of the candidate paragraphs are predicted. As shown in FIG. 2, paragraph _1, paragraph _2, paragraph _ 8230, paragraph \8230, matching pair of paragraph N and query are constructed, and then paragraph _1, paragraph _2, paragraph \8230, matching pair of paragraph N and answer are constructed. And, finally, determining paragraph _1, paragraph _2, \8230;, matching pairs of paragraph N and answer, and degree of matching of query.
Semantic feature coding: semantic coding is carried out on the query and the related candidate paragraphs, a Bert pre-trained language representation model is still adopted, and the coding result is expressed as:
u qj =Bert 2 (concat(query,p j ))
answer start and stop site prediction:
wherein W s And W e A weight matrix for the start position and the end position, respectively.
Loss function:
L=αCE(P start ,Label start )+βCE(P end ,Label end )+γCE(P span ,Label span )
where CE represents the cross entropy loss function, label is the standard answer Label, and span represents the segment from the start to the end position.
At the reordering layer, the optimal answer is obtained by reordering:
predicting matching degree of query and answer:
wherein a is j Is the answer fragment of the candidate paragraph j predicted in step 2.
And (3) predicting a final answer by combining the matching degree of the query and the candidate paragraphs and the matching degree of the query and the answer:
where f is a log smoothing function.
At the answer prediction layer, sorting answers in the candidate paragraphs according to s, and returning a result that topN is selected and s satisfies a condition larger than a threshold value t 2.
Fig. 3 is a model diagram of a multiple-document re-ranking based reading understanding algorithm according to an embodiment of the present invention. During the operation of the model based on the multi-document reordering reading understanding algorithm, the following contents are realized:
step one, preprocessing a document.
Firstly, the preliminary segmentation is carried out according to the natural segments of all candidate documents. Since the header contains clear service information and is highly related to the problem, the multi-level header and each section of content are separately connected together by special symbols, and if the obtained result does not exceed the preset maximum length, the result is taken as the result of preprocessing; otherwise, further cutting is carried out. And finally, obtaining a paragraph candidate set of the multiple documents:
and (II) building a model and determining an answer corresponding to the query content by using the model.
1. Determining the association degree matching of the query and the candidate paragraphs:
(1.1) carrying out semantic feature coding: semantic coding is respectively carried out on the query and the full candidate paragraphs, a Bert pre-trained language representation model is adopted, and the coding result is expressed as follows:
u q =Bert 1 (query)
(1.2) determining the matching degree prediction of the query and the candidate paragraph:
(1.3) determining a loss function:
wherein λ represents a hyper-parameter; omega - Representing a set of candidate documents that are not related to the query;
Ω + representing a set of candidate documents associated with the query.
(1.4) determining filtering candidate paragraphs: determining the degree of matching of the candidate paragraphs according to the formula of step (1.2), for example by scoring the candidate paragraphs to identify the degree of matching, and filtering out the relevant set of candidate paragraphs that remain satisfying a certain threshold t1, denoted as
2. Predicting answer segments of candidate paragraphs:
and (2.1) carrying out semantic feature coding: semantic coding is carried out on the query and the related candidate paragraphs, a Bert pre-trained language characterization model is still adopted, and the coding result is expressed as follows:
u qj =Bert 2 (concat(query,p j ))
(2.2) predicting the start site (or position) and the end site (or position) of the answer:
(2.3) determining a loss function:
L=αCE(P start ,Label start )+βCE(P end ,Label end )+γCE(P span ,Label span )
where CE represents the cross entropy loss function, label is the standard answer Label, and span represents the segment from the start to the end position.
(2.4) filtering candidate answers: determining a score (e.g., a score of a degree of match) of the candidate answer according to step (2.2), the score being an average of the starting position probability and the ending position probability, and filtering the relevant candidate answer set that remains to satisfy the threshold t 2.
3. Predicting the association probability of the query and the candidate answer: (e.g., predicting answer fragments for candidate paragraphs followed by predicting association probabilities between query and candidate answer fragments)
(3.1) semantic feature coding:
[H cls ,H tokens ]=Bert 3 ([query,answer])
wherein H cls Encoding for Global semantic features (Global Embedding);
H tokens encoding semantic features at the character level (token-level Embedding);
answer is a candidate answer which is taken from a candidate answer set;
the query is the query content.
(3.2) feature-hierarchical (e.g., feature-hierarchical of a model for predicting the probability of association of a query with a candidate answer) decomposition: the features are hierarchically decomposed into an intent layer, a core entity layer, and a relationship layer. Wherein, the character codes of the non-core entity in the core entity layer mask query and the answer; character coding of core entities in the relation layer mask query and answer; the intent layer retains the full character encoding. On the basis, the three layers of character string codes are subjected to matrix transformation, and are respectively expressed as follows after being processed by an average pooling layer:
(3.3) joint probability distribution of hierarchical features:
where σ is the activation function, W 1 Is a trainable parameter; y is u Labels for joint probability distribution
(3.4) global feature probability distribution:
wherein W 2 Is a trainable parameter; y is g Labels that are global probability distributions
(3.5) determining a loss function (the loss function in predicting the association probability of the query with the candidate answer):
(3.5.1) global penalty function:
L G =-logP(y g |query,answer)
(3.5.2) distribution variance loss function:
L D =F(P(y u |query,answer),P(y g |query,answer))+F(P(y g |query,answer),P(y u |query,answer))
(3.6) joint loss function:
L=L G +λL D
wherein λ is a hyperparameter
(3.7) associated probability prediction:
and reordering to obtain answers: and (3.7) reordering all answer candidate sets according to the matching degree of the query and the answer, and selecting the answer of topN as a final result. Wherein N is a natural number, such as 5, 10, etc.
Fig. 4 is a schematic structural diagram of an apparatus for matching content based on matching degree according to an embodiment of the present invention. The apparatus 400 comprises: a processing unit 401, a first determining unit 402, a second determining unit 403, a third determining unit 404, and a selecting unit 405.
The processing unit 401 is configured to obtain an original query content input by a user, and perform format processing on the original query content to obtain a query content subjected to format processing. The processing unit 401 is specifically configured to obtain a content processing rule for performing format processing on original query content; and performing format processing on the original query content based on the content processing rule to obtain the query content subjected to format processing.
The system also comprises a preprocessing unit, a searching unit and a searching unit, wherein the preprocessing unit is used for segmenting each document in a plurality of documents in the text content library according to the natural segments to obtain a plurality of natural segments; a plurality of levels of headings in each document are determined, and each level of heading and at least one natural segment associated with the heading are formed into a text paragraph. The preprocessing unit is further used for determining the number of characters in each text paragraph; determining the text paragraphs with the number of characters larger than the character number threshold value as text paragraphs to be processed; and segmenting the text paragraphs to be processed until the number of characters of any text paragraphs obtained through segmentation is less than or equal to the character number threshold value.
A first determining unit 402, configured to determine a matching degree between the format-processed query content and a candidate paragraph in each of a plurality of text paragraphs in the text content library, and determine a text paragraph with the matching degree greater than a first matching degree threshold as a candidate paragraph.
The first determination unit 402 is specifically configured to use a language representation model Bert pre-trained with Bert 1 Determining semantic feature codes u of formatted query content query q :
u q =Bert 1 (query)
Language characterization model Bert pre-trained using Bert 1 Determining each text paragraph p j By semantic feature coding
Calculating the matching degree of the query content subjected to format processing and the candidate paragraph of the jth text paragraph in the plurality of text paragraphs in the text content library
Wherein 0< -j is less than or equal to na, j is a natural number, and na is the number of text paragraphs in the text content library.
The first determining unit 402 is specifically configured to, when determining that the query content subjected to format processing matches with a candidate paragraph of each text paragraph in a plurality of text paragraphs in the text content library, and determining a text paragraph whose candidate paragraph matching degree is greater than a first matching degree threshold as a candidate paragraph, involve the following loss functions:
wherein, lambda is a hyper-parameter, omega - A set of documents that are irrelevant to the query content query after format processing; omega + Is a collection of documents that are relevant to the formatted query content query.
The first determining unit 402 is specifically configured to, after determining that a text passage with a candidate passage matching degree greater than a first matching degree threshold is a candidate passage, form the candidate passage into a candidate passage set:
a second determining unit 403, configured to select an answer segment associated with the format-processed query content in each candidate paragraph, and determine a matching degree between the format-processed query content and the answer segment of each answer segment.
The second determination unit 403 is specifically configured to use the pre-trained language characterization model Bert of Bert 2 Determining semantic feature encodings u for answer fragments associated with formatted query content qj :
u qj =Bert 2 (concat(query,p j ))
Determining the starting position I of the answer segment in the candidate paragraph start And an end position I end :
Wherein,is a weight matrix of the starting position,to weight matrix with end position, softmax is the activation function, P start As starting position probability, P end To end position probability, len (p) j ) Is p j The character length of (2);
based on the starting position I start And an end position I end At each candidate paragraph p j To select an answer segment associated with the formatted query content.
In selecting an answer segment associated with the formatted query content in each candidate passage, the following loss function is involved:
L=αCE(Ps tart ,Label start )+βCE(P end ,Label end )+γCE(P span ,Label span )
wherein CE representsCross entropy loss function, label start As the starting position of the standard answer Label, label end Label as the end position of the standard answer Label span An answer segment representing a standard answer label from a start position to an end position; α, β, γ are hyper-parameters.
A third determining unit 404, configured to determine a matching degree between the query content subjected to format processing and the answer segment based on the matching degree between the candidate paragraphs and the matching degree between the answer segments.
The third determining unit 404 is specifically configured to:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature coding of answer fragments for a jth candidate paragraph
Determining format-processed query content u q Matching degree of answer segment with j-th answer segment
Wherein, a j Is the answer segment of the jth candidate paragraph.
The third determining unit 404 is specifically configured to determine the matching degree of the answer segmentPerforming logarithmic smoothing to obtain the matching degree after the smoothing
Based on candidate paragraph matching degreeAnd degree of matching after smoothingDetermining the matching degree s of the query content subjected to format processing and the answer fragment:
where f is a log smoothing function.
A selecting unit 405, configured to select at least one target sub-paragraph associated with the format-processed query content from the multiple answer segments based on a matching degree between the format-processed query content and the answer segments.
The selecting unit 405 is specifically configured to sort the answer fragments according to a descending order of matching degrees of the query content and the answer fragments subjected to format processing, so as to generate a sorted list;
acquiring a preset extraction parameter N, and selecting N answer segments with the maximum matching degree from the sorted list;
and determining at least one answer segment with the matching degree larger than a second matching degree threshold value in the N answer segments with the maximum matching degree as a target subsection.
Claims (15)
1. A method for content matching based on a degree of matching, the method comprising:
acquiring original query content input by a user, and performing format processing on the original query content to acquire the query content subjected to format processing;
determining the matching degree of the candidate paragraphs of the query content subjected to format processing and each of a plurality of text paragraphs in a text content library, and determining the text paragraphs with the matching degree of the candidate paragraphs greater than a first matching degree threshold value as candidate paragraphs;
selecting an answer segment associated with the format-processed query content in each candidate paragraph, and determining the matching degree of the format-processed query content and the answer segment of each answer segment;
determining the matching degree of the query content subjected to format processing and the answer segment based on the matching degree of the candidate paragraphs and the matching degree of the answer segment; and
selecting at least one target sub-paragraph associated with the formatted query content from a plurality of answer segments based on a degree of matching of the formatted query content to an answer segment.
2. The method of claim 1, the formatting the original query content to obtain formatted query content, comprising:
acquiring a content processing rule for performing format processing on original query content;
and performing format processing on the original query content based on a content processing rule to obtain the query content subjected to format processing.
3. The method of claim 1, further comprising, prior to obtaining original query content entered by a user,
segmenting each document in the plurality of documents in the text content library according to a natural segment to obtain a plurality of natural segments;
a plurality of levels of headings in each document are determined, and each level of headings and at least one natural segment associated with the headings are formed into a text paragraph.
4. The method of claim 3, further comprising,
determining the number of characters in each text paragraph;
determining the text paragraphs with the number of characters larger than the character number threshold value as text paragraphs to be processed;
and segmenting the text paragraphs to be processed until the number of characters of any text paragraphs obtained through segmentation is smaller than or equal to a character number threshold value.
5. The method of claim 1, the determining a degree of matching of the formatted query content to candidate paragraphs for each of a plurality of text paragraphs within a textual content library, comprising:
language characterization model Bert pre-trained using Bert 1 Determining semantic feature codes u of the formatted query content query q :
u q =Bert 1 (query)
Language characterization model Bert pre-trained using Bert 1 Determining each text paragraph p j Semantic feature coding of
Calculating the matching degree of the query content after format processing and the candidate paragraph of the jth text paragraph in the plurality of text paragraphs in the text content library
Wherein 0< -j is less than or equal to na, j is a natural number, and na is the number of text paragraphs in the text content library.
6. The method of claim 5, wherein in determining that the formatted query content matches a candidate passage for each of a plurality of passages of text within the library of contents, determining as the candidate passage a passage having a candidate passage matching greater than a first threshold of matching, involves:
wherein, lambda is a hyper-parameter, omega - A set of documents that are irrelevant to the query content query after format processing; omega + Is a collection of documents related to the formatted query content query.
8. the method of claim 1, the selecting, in each candidate passage, an answer segment associated with the formatted query content, comprising:
language characterization model Bert pre-trained using Bert 2 Determining semantic feature encodings u for answer fragments associated with the formatted query content qj :
u qj =Bert 2 (concat(query,p j ))
Determining the starting position I of the answer segment in the candidate paragraph start And an end position I end :
Wherein,is a weight matrix of the starting position,to be a weight matrix with the end position, softmax is an activation function, P start As starting position probability, P end To end position probability, len (p) j ) Is p j The character length of (2);
based on the starting position I start And an end position I end At each candidate paragraph p j To select an answer segment associated with the formatted query content.
9. The method of claim 1, when selecting an answer segment associated with the formatted query content in each candidate passage, involving the following loss function:
L=αCE(P start ,Label start )+βCE(P end ,Label end )+γCE(P span ,Label span )
where CE represents the cross entropy loss function, label start As the starting position of the standard answer Label, label end Label as the end position of the Standard answer Label span An answer segment representing a standard answer label from a start position to an end position; α, β, γ are hyper-parameters.
10. The method of claim 1, determining a degree of match of the formatted query content with an answer segment of each answer segment, comprising:
language table pre-trained using BertMark model Bert 1 Determining semantic feature coding of answer fragments for the jth candidate paragraph
Determining the format-processed query content u q Matching degree of answer segment with j-th answer segment
Wherein, a j Is the answer fragment of the jth candidate paragraph.
11. The method of claim 10, wherein determining the matching degree of the formatted query content and the answer segment based on the matching degree of the candidate paragraphs and the matching degree of the answer segment comprises:
matching degree of the answer segmentPerforming logarithmic smoothing to obtain the matching degree after smoothing
Based on candidate paragraph matching degreeAnd degree of matching by smoothingDetermining the matching degree s of the query content subjected to format processing and the answer fragment:
where f is a log smoothing function.
12. The method of claim 11, selecting at least one target sub-paragraph from a plurality of answer segments associated with the formatted query content based on a degree of matching of the formatted query content to an answer segment, comprising:
sorting the answer fragments according to the descending order of the matching degree of the query contents and the answer fragments after format processing so as to generate a sorted list;
acquiring a preset extraction parameter N, and selecting N answer segments with the maximum matching degree from the sorted list;
and determining at least one answer segment with the matching degree larger than a second matching degree threshold value in the N answer segments with the maximum matching degree as a target subsection.
13. An apparatus for content matching based on a degree of matching, the apparatus comprising:
the processing unit is used for acquiring original query content input by a user and carrying out format processing on the original query content to acquire the query content subjected to format processing;
a first determining unit, configured to determine a matching degree between the query content subjected to format processing and a candidate paragraph of each text paragraph in a plurality of text paragraphs in a text content library, and determine a text paragraph with the matching degree greater than a first matching degree threshold as a candidate paragraph;
a second determining unit, configured to select, in each candidate paragraph, an answer segment associated with the query content subjected to format processing, and determine a matching degree between the query content subjected to format processing and the answer segment of each answer segment;
a third determining unit, configured to determine, based on the candidate paragraph matching degree and the answer segment matching degree, a matching degree between the query content subjected to format processing and the answer segment; and
and the selecting unit is used for selecting at least one target sub-paragraph associated with the query content subjected to format processing from a plurality of answer segments based on the matching degree of the query content subjected to format processing and the answer segments.
14. A computer-readable storage medium, characterized in that the storage medium stores a computer program for performing the method of any of claims 1-12.
15. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor to read the executable instructions from the memory and execute the instructions to implement the method of any of claims 1-12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211074234.1A CN115470332B (en) | 2022-09-02 | 2022-09-02 | Intelligent question-answering system for content matching based on matching degree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211074234.1A CN115470332B (en) | 2022-09-02 | 2022-09-02 | Intelligent question-answering system for content matching based on matching degree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115470332A true CN115470332A (en) | 2022-12-13 |
CN115470332B CN115470332B (en) | 2023-03-31 |
Family
ID=84368655
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211074234.1A Active CN115470332B (en) | 2022-09-02 | 2022-09-02 | Intelligent question-answering system for content matching based on matching degree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115470332B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200210489A1 (en) * | 2018-12-27 | 2020-07-02 | International Business Machines Corporation | Extended query performance prediction framework utilizing passage-level information |
CN111460089A (en) * | 2020-02-18 | 2020-07-28 | 北京邮电大学 | Multi-paragraph reading understanding candidate answer sorting method and device |
CN112163079A (en) * | 2020-09-30 | 2021-01-01 | 民生科技有限责任公司 | Intelligent conversation method and system based on reading understanding model |
CN112417105A (en) * | 2020-10-16 | 2021-02-26 | 泰康保险集团股份有限公司 | Question and answer processing method and device, storage medium and electronic equipment |
CN113449754A (en) * | 2020-03-26 | 2021-09-28 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and medium for training and displaying matching model of label |
-
2022
- 2022-09-02 CN CN202211074234.1A patent/CN115470332B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200210489A1 (en) * | 2018-12-27 | 2020-07-02 | International Business Machines Corporation | Extended query performance prediction framework utilizing passage-level information |
CN111460089A (en) * | 2020-02-18 | 2020-07-28 | 北京邮电大学 | Multi-paragraph reading understanding candidate answer sorting method and device |
CN113449754A (en) * | 2020-03-26 | 2021-09-28 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and medium for training and displaying matching model of label |
CN112163079A (en) * | 2020-09-30 | 2021-01-01 | 民生科技有限责任公司 | Intelligent conversation method and system based on reading understanding model |
CN112417105A (en) * | 2020-10-16 | 2021-02-26 | 泰康保险集团股份有限公司 | Question and answer processing method and device, storage medium and electronic equipment |
Non-Patent Citations (2)
Title |
---|
YOHAN KIM等: "Question answering method for infrastructure damage information retrieval from textual data using bidirectional encoder representations from transformers", 《ELSEVIER》 * |
黄永: "学术文本的结构功能识别——基于段落的识别", 《情报学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN115470332B (en) | 2023-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271505B (en) | Question-answering system implementation method based on question-answer pairs | |
CN111639171B (en) | Knowledge graph question-answering method and device | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN110188272B (en) | Community question-answering website label recommendation method based on user background | |
CN110597735A (en) | Software defect prediction method for open-source software defect feature deep learning | |
CN111966812B (en) | Automatic question answering method based on dynamic word vector and storage medium | |
CN112464656A (en) | Keyword extraction method and device, electronic equipment and storage medium | |
CN112051986B (en) | Code search recommendation device and method based on open source knowledge | |
CN116342167B (en) | Intelligent cost measurement method and device based on sequence labeling named entity recognition | |
CN111858842A (en) | Judicial case screening method based on LDA topic model | |
CN115982338A (en) | Query path ordering-based domain knowledge graph question-answering method and system | |
CN114048354A (en) | Test question retrieval method, device and medium based on multi-element characterization and metric learning | |
CN114265935A (en) | Science and technology project establishment management auxiliary decision-making method and system based on text mining | |
CN117009521A (en) | Knowledge-graph-based intelligent process retrieval and matching method for engine | |
CN115905487A (en) | Document question and answer method, system, electronic equipment and storage medium | |
CN114970563A (en) | Chinese question generation method and system fusing content and form diversity | |
CN111104503A (en) | Construction engineering quality acceptance standard question-answering system and construction method thereof | |
CN116049376B (en) | Method, device and system for retrieving and replying information and creating knowledge | |
CN112667819A (en) | Entity description reasoning knowledge base construction and reasoning evidence quantitative information acquisition method and device | |
CN115470332B (en) | Intelligent question-answering system for content matching based on matching degree | |
CN115840815A (en) | Automatic abstract generation method based on pointer key information | |
CN115238705A (en) | Semantic analysis result reordering method and system | |
CN115204519A (en) | Knowledge information fused domain patent quality grade prediction method | |
CN115017260A (en) | Keyword generation method based on subtopic modeling | |
CN114580556A (en) | Method and device for pre-evaluating patent literature |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |