CN112115238B - Question-answering method and system based on BERT and knowledge base - Google Patents

Question-answering method and system based on BERT and knowledge base Download PDF

Info

Publication number
CN112115238B
CN112115238B CN202011177960.7A CN202011177960A CN112115238B CN 112115238 B CN112115238 B CN 112115238B CN 202011177960 A CN202011177960 A CN 202011177960A CN 112115238 B CN112115238 B CN 112115238B
Authority
CN
China
Prior art keywords
bert
question
answer
text
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011177960.7A
Other languages
Chinese (zh)
Other versions
CN112115238A (en
Inventor
廖伟智
黄明彤
阴艳超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011177960.7A priority Critical patent/CN112115238B/en
Publication of CN112115238A publication Critical patent/CN112115238A/en
Application granted granted Critical
Publication of CN112115238B publication Critical patent/CN112115238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Abstract

The invention discloses a question-answering method and a question-answering system based on BERT and a knowledge base, which are applied to the field of information retrieval and aim at the defects of the existing question-answering system based on the knowledge base; and training the two models, processing the question corpus to be answered by adopting the two trained models, obtaining the correct answer of the question, and automatically rewriting the answer.

Description

Question-answering method and system based on BERT and knowledge base
Technical Field
The invention belongs to the field of information retrieval, and particularly relates to a question and answer searching technology.
Background
The traditional question and answer search is based on keyword retrieval, and does not consider semantic information of question texts. In the knowledge base question-answering system, a questioner inputs a specific question text, analyzes and processes the question text on line, and then retrieves and outputs a best matching answer text to obtain a quick and accurate answer to a question.
The knowledge base question-answering system and method are mainly divided into three categories:
1) Information retrieval-based method
By extracting the question entity and the attribute relation from the question text, the question text is then searched in a knowledge base.
2) Method based on semantic analysis
And searching a logic expression of the question text in a knowledge base to obtain a search result and converting the search result into an answer.
3) Deep learning-based method
And preprocessing the problem text to obtain vectorized input, mapping the triple text in the knowledge base to a vector space, and analyzing and calculating the similarity to obtain a triple result with the highest similarity.
The prior art has the defects that:
1. with semantic analysis based approaches, there are obstacles between logical expressions and natural language semantics;
2. the method based on information retrieval cannot analyze semantic information in the problem text, and particularly cannot fully utilize context information to obtain ambiguity elimination of an entity;
3. the existing models such as CNN, RNN, bi-LSTM and the like have no good model training effect, accuracy, F1 value and the like of BerT, transformer and other leading edge models, and are lack of correlation analysis on words or words in problem texts.
Disclosure of Invention
In order to solve the technical problems, the invention provides a question-answering method and a question-answering system based on BERT (Bidirectional Encoder retrieval from Transformers) and a knowledge base.
One of the technical schemes adopted by the invention is as follows: a question-answering method based on BERT and a knowledge base comprises the following steps:
A. acquiring question and answer corpora used for constructing a knowledge base and used for BERT downstream task training, and preprocessing the question and answer corpora;
B. b, constructing a question-answer knowledge base according to the question-answer corpus preprocessed in the step A;
C. b, constructing a language model based on BERT according to the question and answer corpus preprocessed in the step A;
D. acquiring the training question-answer corpus data of the BERT language model according to the step C, and labeling to form a labeled corpus;
E. constructing a named entity recognition model based on BERT-CRF and the language model according to the BERT language model obtained in the step C and the preprocessed labeled corpus in the step D;
F. constructing a text similarity two-classification model based on the BERT and the language model according to the BERT language model obtained in the step C and the preprocessed labeled corpus in the step D;
G. respectively training by using the BERT-CRF (Conditional Random Fields) model obtained in the step E and the text attribute binary classification models of the BERT and the language model obtained in the step F by using the labeled linguistic data to respectively obtain a BERT-CRF language model with parameter weight and a BERT text similarity binary classification model;
H. and (B) obtaining a BERT-CRF language model with parameter weight and a BERT text similarity two-classification model by utilizing the E, F, processing the linguistic data of the question to be answered by combining the question and answer knowledge base obtained in the step B to obtain the correct answer of the question, and automatically rewriting the answer.
The question-answer corpus preprocessed in the step A comprises the following steps: the system comprises an entity labeling data set, a sample set and a ternary array set, wherein the sample set is used for matching sentence similarity and is obtained according to the entity labeling data set, and the ternary array set comprises a question entity, an attribute entity and answer text.
And step B, constructing a question-answer knowledge base by combining the ternary arrays.
The second scheme adopted by the invention is as follows: a question-answering system based on BERT and knowledge base comprises a question text input module, a question text vectorization module and a question text output module, wherein the question text input module is used for inputting a question text and vectorizing the text; the BERT-CRF named entity recognition module is used for carrying out named entity recognition on the problem text and recognizing the problem entity; the knowledge base retrieval module is used for retrieving the problem entities to obtain candidate triple entities, feeding back the candidate attributes to the BERT text attribute identification module, and combining the best attributes fed back by the BERT text attribute identification module with the problem entities to obtain the final best triple; the BERT attribute identification module is used for performing correlation analysis on the candidate attributes and the problem text to obtain the optimal attributes, and feeding the optimal attributes back to the knowledge base; and the answer generating module is used for rewriting the optimal triple obtained by the knowledge base searching module into an answer text and outputting the answer text to the questioner.
The invention has the beneficial effects that: the invention relates to a question-answering method and a question-answering system based on BERT and a knowledge base, which are characterized in that a BERT-CRF named entity model and a BERT text similarity binary classification model are combined, a multi-head attention mechanism is utilized, the relation between words or between words is better utilized, semantic representation of more layers is obtained through BERT word embedding, wherein the average F1 value of the BERT-CRF tested on an NLPCC-ICCPOL 2016KBQA public data set by the BERT-CRF named entity recognition model reaches 99.4%, the recognition accuracy in the question-answering process is improved, and more accurate answer is obtained by combining with a retrieval knowledge base.
Drawings
FIG. 1 is a flow chart of a protocol of the present invention;
FIG. 2 is a diagram illustrating an overall architecture of a BERT pre-training language model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a bidirectional Transformer layer according to an embodiment of the present invention;
FIG. 4 is a named entity recognition model based on BERT-CRF and language model provided by the embodiment of the invention;
FIG. 5 is a two-classification model of text similarity based on BERT and language models provided by an embodiment of the present invention;
fig. 6 is a block diagram of a query-answering system based on BERT and a knowledge base according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, the question-answering method based on BERT and knowledge base of the present invention includes the following steps:
A. acquiring question and answer corpora used for constructing a knowledge base and used for BERT downstream task training, and preprocessing the question and answer corpora;
B. b, constructing a question-answer knowledge base according to the question-answer corpus preprocessed in the step A, wherein a triple is composed of a question entity, an attribute entity and an answer text and is stored as the question-answer knowledge base;
C. b, constructing a language model based on BERT according to the question and answer corpus preprocessed in the step A;
D. c, obtaining training question and answer corpus data of the BERT language model according to the step C, and labeling to form a labeled corpus;
E. constructing a named entity recognition model based on BERT-CRF and the language model according to the BERT language model obtained in the step C and the preprocessed labeled corpus in the step D;
F. b, according to the BERT language model obtained in the step C and the pre-processed labeled corpus in the step D, constructing a text similarity two-classification model based on the BERT and the language model;
G. respectively training by using the BERT-CRF model obtained in the step E and the text attribute binary classification models of the BERT and the language model obtained in the step F by using the labeled corpora to respectively obtain a BERT-CRF language model with parameter weight and a BERT text similarity binary classification model;
H. and (4) obtaining a BERT-CRF language model with parameter weight and a BERT text similarity two-classification model by utilizing E, F, processing the question corpus to be answered by combining the question and answer knowledge base obtained in the step (B) to obtain the correct answer of the question, and automatically rewriting the answer.
In the step A, question and answer corpora used for constructing a knowledge base and used for BERT downstream task training are obtained and preprocessed. The method specifically comprises the following steps:
a1, dividing original question-answer pair data into a training set, a verification set and a test set, wherein each pair of data comprises four components of a question text, a question entity, an attribute entity and an answer text;
raw data example:
what types of patents are? (question text); patents (problem entities); type (attribute entity); invention, utility model and appearance design (answer text);
a2, automatically generating a training set, a verification set and a test set from the original question and answer to data to generate entity marking data, namely constructing a sample set for training entity identification, and constructing an entity identification training set, a verification set and a test set, wherein an entity sequence is marked for training a BERT-CRF model;
a3, constructing a sample set structure attribute association training set, a verification set and a test set used for matching sentence similarity through data in the entity recognition training set, the verification set and the test set in the step A2, and using the sample set structure attribute association training set, the verification set and the test set for a two-classification task, namely training a BERT two-classification model;
and A4, processing original data used for constructing a question-answer knowledge base, and processing the original data containing question texts, question entities, attribute entities and answer texts into a clean triple data set, wherein the processed triple data set comprises the question entities, the attribute entities and the answer texts.
Triple data example:
{ (question entity), (Attribute entity), (answer text) }
{ patent, type, invention, utility model and design }
And loading and storing the processed triple data set in the step A4 into a database.
And (D) constructing a BERT-based language model through the question and answer corpus preprocessed in the step (A) as described in the step (C). The method comprises the following steps:
constructing a BERT pre-training language model, wherein the model has strong language feature extraction capability and is convenient for downstream tasks to extract features on line, the overall architecture of the BERT pre-training language model is shown in figure 2, and the construction process comprises the following sub-steps:
c1, constructing an Embedding layer, wherein the Embedding layer is formed by summing three types of Embedding (Token Embedding, segment Embedding and Position Embedding):
token entries are word vectors, the first word is the CLS Token, and can be used for subsequent classification tasks
Segment Embeddings are used to distinguish two sentences because pre-training does not just do LM but also do classification tasks with two sentences as input
Position Embeddings are learned from trigonometric functions
C2, masked LM, for training deep bi-directional language representation, i.e. masking a part of the original corpus, and then predicting the Masked part of words or characters, and predicting 15% of the characters in each sentence of random mask by using the context thereof, wherein 80% is using [ mask ], for example, "what types are patent? "→" which are the patents [ mask ] [ mask ]? ",10% are words that randomly take a word in place of a mask," what types are patents? "→" where are the patents? ",10% remain unchanged," what types are patents? "→" what types are there in the patent? ".
C3, constructing a bidirectional Transformer layer structure, which is a depth network based on a self-attention mechanism, wherein the structure is shown in FIG. 3;
the key part of the layer structure is a self-attention mechanism, and the word characterization is obtained mainly by adjusting a weight coefficient matrix according to the association degree between words in the same sentence:
Figure GDA0003868393940000051
wherein: q represents a Query vector of the table, K represents a Key vector, V represents a Value vector,
Figure GDA0003868393940000052
r represents a set of overall real numbers, d k Is the input vector dimension of Q, K,
Figure GDA0003868393940000053
for the penalty factor, the characters or words in a sentence are related through a self-attention mechanism, and the relevance of different words or characters in a sentence is expressed to a certain extent.
Each sub-layer (the self-attention mechanism layer and the feedforward neural network layer) is connected with a residual Add module and a Normailize layer normalization module, namely the Add and Normailize layers in FIG. 3, residual connection is used for solving the problem of difficulty in network training, layer normalization is performed on the last dimension, the phenomenon that the value in the layers is changed too much can be prevented, the training process of the models is accelerated, and the models can be converged more quickly.
And D, acquiring the training question-answer corpus data of the BERT language model, and labeling to form a labeled corpus.
And the BIO labeling is adopted for the part of the corpus identified and processed by the D1 and BERT-CRF entities, so that only a problem entity needs to be labeled, a plurality of entity types do not need to be labeled, and a word-based BIO labeling is uniformly used. Example (c):
proprietary/beneficial/owned/which/some/class/type? → B-NER/I-NER/O/O/O/O/O
And D2, marking the training corpus of the BERT attribute similarity model, adopting 0 and 1 marking, and simultaneously randomly and automatically sampling 5 negative samples, wherein the example is as follows and comprises 'question + attribute + 0/1'.
Figure GDA0003868393940000054
Step E, constructing a named entity recognition model based on BERT-CRF and the language model according to the BERT language model obtained in the step C and the preprocessed labeled corpus in the step D, and as shown in FIG. 4, the method comprises the following steps:
e1, constructing an entity identification model for a downstream entity identification task, wherein the BERT principle is the same as the step C.
And E2, a CRF layer obtains a global optimal label sequence by considering the adjacent relation between the labels, is used for segmenting and marking sequence data, and is a discriminant method for predicting an output sequence according to an input sequence. The application of CRF to named entity recognition is to define an evaluation score calculation formula given the text sequence X = { X1, X2, · ·, xn } that needs to be predicted, and the output prediction sequence Y = { Y1, Y2, ·, yn } of the BERT model, as follows:
Figure GDA0003868393940000061
wherein W represents a label migration matrix, W i,j Indicates that the label i is shifted to the fraction of label j, n is the sequence length,
Figure GDA0003868393940000062
y represents the position i Score of each label.
The P-calculated probability formula represents the corresponding probability of the original sequence based on the predicted sequence.
Figure GDA0003868393940000063
F. Constructing a text similarity two-classification model based on the BERT and the language model according to the BERT language model obtained in the step C and the preprocessed labeled corpus in the step D;
f1, constructing a Bert downstream task for attribute similarity training and testing problem attributes. The structure is shown in fig. 5.
Step G, respectively training by using the BERT-CRF model obtained in the step E and the text attribute binary classification models of the BERT and the language model obtained in the step F by using the labeled linguistic data to respectively obtain a BERT-CRF language model with parameter weight and a BERT text similarity binary classification model;
and step H, obtaining a BERT-CRF language model with parameter weight and a BERT text similarity two-classification model by utilizing E, F, processing the linguistic data of the question to be answered by combining the question and answer knowledge base obtained in the step A to obtain the correct answer of the question, and automatically rewriting the answer.
H1, extracting entities on line by a BERT-CRF model with parameter weight for the question text, and inquiring a knowledge base to obtain candidate triples { (question entities), (attribute entities), (answer text) }.
And H2, performing relevance prediction on the attribute entity and the problem text by the problem text through a BERT text attribute similarity binary classification model to obtain matching with a label of 1.
H3, obtaining accurate triple texts, rewriting correct answers, and outputting answers of the questions.
FIG. 6 shows a system portion of the present invention, comprising: the question text input module is used for inputting a question text and vectorizing the text; the BERT-CRF named entity recognition module is used for carrying out named entity recognition on the question text and recognizing the question entity; the knowledge base retrieval module is used for retrieving the problem entities to obtain candidate triple entities, feeding back the candidate attributes to the BERT text attribute identification module, and combining the best attributes fed back by the BERT text attribute identification module with the problem entities to obtain the final best triple; the BERT attribute identification module is used for carrying out correlation analysis on the candidate attributes and the problem text to obtain the optimal attributes, and feeding the optimal attributes back to the knowledge base; and the answer generating module is used for rewriting the optimal triple obtained by the knowledge base searching module into an answer text and outputting the answer text to the questioner.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (6)

1. A question-answering method based on BERT and a knowledge base is characterized by comprising the following steps:
A. acquiring question and answer corpora used for constructing a knowledge base and used for BERT downstream task training, and preprocessing the question and answer corpora;
B. b, constructing a question-answer knowledge base according to the question-answer corpus preprocessed in the step A;
C. b, constructing a language model based on BERT according to the question and answer corpus preprocessed in the step A; the step C comprises the following sub-steps:
c1, constructing an Embedding layer, wherein the Embedding layer is formed by summing three types of Embedding, and the three types of Embedding comprise: token Embeddings, segment Embeddings, position Embeddings;
c2, masked LM, used for training the language representation of the deep two-way, specifically: covering a part of original linguistic data, then predicting the covered part of words or characters, randomly predicting 15% of characters in each sentence of a mask by using the context of the characters, wherein in all the original linguistic data, 80% of the characters adopt 'mask', 10% of the characters randomly take one word to replace the word of the mask, and the remaining 10% of the characters are kept unchanged;
c3, constructing a bidirectional Transformer layer structure based on a self-attention mechanism;
D. c, obtaining training question and answer corpus data of the BERT language model according to the step C, and labeling to form a labeled corpus;
E. constructing a named entity recognition model based on BERT-CRF and the language model according to the BERT language model obtained in the step C and the preprocessed labeled corpus in the step D;
F. constructing a text similarity two-classification model based on the BERT and the language model according to the BERT language model obtained in the step C and the preprocessed labeled corpus in the step D;
G. respectively training by using the BERT-CRF model obtained in the step E and the text attribute binary classification models of the BERT and the language model obtained in the step F by using the labeled corpora to respectively obtain a BERT-CRF language model with parameter weight and a BERT text similarity binary classification model;
H. and (4) obtaining a BERT-CRF language model with parameter weight and a BERT text similarity two-classification model by utilizing E, F, processing the question corpus to be answered by combining the question and answer knowledge base obtained in the step (B) to obtain the correct answer of the question, and automatically rewriting the answer.
2. The question-answer method based on the BERT and the knowledge base as claimed in claim 1, wherein the question-answer corpus preprocessed in the step A comprises: the system comprises an entity labeling data set, a sample set and a ternary array set, wherein the sample set is used for matching sentence similarity and is obtained according to the entity labeling data set, and the ternary array set comprises a question entity, an attribute entity and answer text.
3. The BERT and knowledge-base based question-answering method according to claim 1, wherein the question-answering knowledge base is constructed in the step B by combining ternary arrays.
4. The BERT and knowledge-base based question-answering method according to claim 1, wherein the self-attention mechanism adjusts the weight coefficient matrix to obtain the word characterization according to the association degree between words in the same sentence:
Figure FDA0003868393930000021
wherein: q represents a Query vector, K represents a Key vector, and V representsValue vector, d k Is the input vector dimension of Q, K,
Figure FDA0003868393930000022
is a penalty factor.
5. The BERT and knowledge-base based question-answering method according to claim 1, wherein step D comprises:
d1, marking the corpus part of the BERT-CRF entity identification processing by adopting BIO;
and D2, marking the BERT attribute similarity model training corpus, and adopting 0 and 1 for marking.
6. A BERT and knowledge base based question-answering system comprising: the system comprises a question text input module, a BERT-CRF named entity identification module, a knowledge base retrieval module, a BERT attribute identification module and an answer generation module;
the construction process of the BERT-CRF named entity recognition module comprises the following steps:
c1, constructing an Embedding layer, wherein the Embedding layer is formed by summing three types of Embedding, and the three types of Embedding comprise: token 12, segment 12, position 12;
c2, masked LM, used for training the language representation of the deep two-way, specifically: covering a part of original linguistic data, then predicting the covered part of words or characters, randomly predicting 15% of characters in each sentence of a mask by using the context of the characters, wherein in all the original linguistic data, 80% of the characters adopt 'mask', 10% of the characters randomly take one word to replace the word of the mask, and the remaining 10% of the characters are kept unchanged;
c3, constructing a bidirectional Transformer layer structure based on a self-attention mechanism;
the question text input module is used for inputting a question text and vectorizing the text; the BERT-CRF named entity recognition module is used for carrying out named entity recognition on the question text and recognizing the question entity; the knowledge base retrieval module is used for retrieving the problem entities to obtain candidate triple entities, feeding back the candidate attributes to the BERT attribute identification module, and combining the best attributes fed back by the BERT attribute identification module with the problem entities to obtain the final best triple; the BERT attribute identification module is used for carrying out correlation analysis on the candidate attributes and the problem text to obtain the optimal attributes, and feeding the optimal attributes back to the knowledge base; and the answer generating module is used for rewriting the optimal triple obtained by the knowledge base searching module into an answer text and outputting the answer text to the questioner.
CN202011177960.7A 2020-10-29 2020-10-29 Question-answering method and system based on BERT and knowledge base Active CN112115238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011177960.7A CN112115238B (en) 2020-10-29 2020-10-29 Question-answering method and system based on BERT and knowledge base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011177960.7A CN112115238B (en) 2020-10-29 2020-10-29 Question-answering method and system based on BERT and knowledge base

Publications (2)

Publication Number Publication Date
CN112115238A CN112115238A (en) 2020-12-22
CN112115238B true CN112115238B (en) 2022-11-15

Family

ID=73794987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011177960.7A Active CN112115238B (en) 2020-10-29 2020-10-29 Question-answering method and system based on BERT and knowledge base

Country Status (1)

Country Link
CN (1) CN112115238B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667808A (en) * 2020-12-23 2021-04-16 沈阳新松机器人自动化股份有限公司 BERT model-based relationship extraction method and system
CN112765314B (en) * 2020-12-31 2023-08-18 广东电网有限责任公司 Power information retrieval method based on power ontology knowledge base
CN112733541A (en) * 2021-01-06 2021-04-30 重庆邮电大学 Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN113360606A (en) * 2021-06-24 2021-09-07 哈尔滨工业大学 Knowledge graph question-answer joint training method based on Filter
CN113553410B (en) * 2021-06-30 2023-09-22 北京百度网讯科技有限公司 Long document processing method, processing device, electronic equipment and storage medium
CN113435213B (en) * 2021-07-09 2024-04-30 支付宝(杭州)信息技术有限公司 Method and device for returning answers to user questions and knowledge base
CN113689851B (en) * 2021-07-27 2024-02-02 国家电网有限公司 Scheduling professional language understanding system and method
CN113642862A (en) * 2021-07-29 2021-11-12 国网江苏省电力有限公司 Method and system for identifying named entities of power grid dispatching instructions based on BERT-MBIGRU-CRF model
CN113808709B (en) * 2021-08-31 2024-03-22 天津师范大学 Psychological elasticity prediction method and system based on text analysis
CN114398256A (en) * 2021-12-06 2022-04-26 南京行者易智能交通科技有限公司 Big data automatic testing method based on Bert model
CN115422934B (en) * 2022-07-08 2023-06-16 中国科学院空间应用工程与技术中心 Entity identification and linking method and system for space text data
CN116089594B (en) * 2023-04-07 2023-07-25 之江实验室 Method and device for extracting structured data from text based on BERT question-answering model
CN116595148B (en) * 2023-05-25 2023-12-29 北京快牛智营科技有限公司 Method and system for realizing dialogue flow by using large language model
CN116756295B (en) * 2023-08-16 2023-11-03 北京盛通知行教育科技集团有限公司 Knowledge base retrieval method, device and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390023A (en) * 2019-07-02 2019-10-29 安徽继远软件有限公司 A kind of knowledge mapping construction method based on improvement BERT model
CN110516055A (en) * 2019-08-16 2019-11-29 西北工业大学 A kind of cross-platform intelligent answer implementation method for teaching task of combination BERT
CN111027595A (en) * 2019-11-19 2020-04-17 电子科技大学 Double-stage semantic word vector generation method
CN111090990A (en) * 2019-12-10 2020-05-01 中电健康云科技有限公司 Medical examination report single character recognition and correction method
CN111414465A (en) * 2020-03-16 2020-07-14 北京明略软件系统有限公司 Processing method and device in question-answering system based on knowledge graph
CN111563383A (en) * 2020-04-09 2020-08-21 华南理工大学 Chinese named entity identification method based on BERT and semi CRF
CN111680511A (en) * 2020-04-21 2020-09-18 华东师范大学 Military field named entity identification method with cooperation of multiple neural networks
CN111767368A (en) * 2020-05-27 2020-10-13 重庆邮电大学 Question-answer knowledge graph construction method based on entity link and storage medium
CN111831792A (en) * 2020-07-03 2020-10-27 国网江苏省电力有限公司信息通信分公司 Electric power knowledge base construction method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442676A (en) * 2019-07-02 2019-11-12 北京邮电大学 Patent retrieval method and device based on more wheel dialogues
CN110765257B (en) * 2019-12-30 2020-03-31 杭州识度科技有限公司 Intelligent consulting system of law of knowledge map driving type
CN111159385B (en) * 2019-12-31 2023-07-04 南京烽火星空通信发展有限公司 Template-free general intelligent question-answering method based on dynamic knowledge graph

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390023A (en) * 2019-07-02 2019-10-29 安徽继远软件有限公司 A kind of knowledge mapping construction method based on improvement BERT model
CN110516055A (en) * 2019-08-16 2019-11-29 西北工业大学 A kind of cross-platform intelligent answer implementation method for teaching task of combination BERT
CN111027595A (en) * 2019-11-19 2020-04-17 电子科技大学 Double-stage semantic word vector generation method
CN111090990A (en) * 2019-12-10 2020-05-01 中电健康云科技有限公司 Medical examination report single character recognition and correction method
CN111414465A (en) * 2020-03-16 2020-07-14 北京明略软件系统有限公司 Processing method and device in question-answering system based on knowledge graph
CN111563383A (en) * 2020-04-09 2020-08-21 华南理工大学 Chinese named entity identification method based on BERT and semi CRF
CN111680511A (en) * 2020-04-21 2020-09-18 华东师范大学 Military field named entity identification method with cooperation of multiple neural networks
CN111767368A (en) * 2020-05-27 2020-10-13 重庆邮电大学 Question-answer knowledge graph construction method based on entity link and storage medium
CN111831792A (en) * 2020-07-03 2020-10-27 国网江苏省电力有限公司信息通信分公司 Electric power knowledge base construction method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于机器阅读理解的中文智能问答技术研究与实现;贾欣;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200715;I138-1594 *

Also Published As

Publication number Publication date
CN112115238A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN111444721B (en) Chinese text key information extraction method based on pre-training language model
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN112989834B (en) Named entity identification method and system based on flat grid enhanced linear converter
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN109271537B (en) Text-to-image generation method and system based on distillation learning
CN111046179B (en) Text classification method for open network question in specific field
CN111858896B (en) Knowledge base question-answering method based on deep learning
CN112270188B (en) Questioning type analysis path recommendation method, system and storage medium
CN113962219A (en) Semantic matching method and system for knowledge retrieval and question answering of power transformer
CN111914556B (en) Emotion guiding method and system based on emotion semantic transfer pattern
CN112328800A (en) System and method for automatically generating programming specification question answers
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN114491024A (en) Small sample-based specific field multi-label text classification method
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
CN112800184A (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN117454898A (en) Method and device for realizing legal entity standardized output according to input text
CN113641809A (en) XLNET-BiGRU-CRF-based intelligent question answering method
CN117056451A (en) New energy automobile complaint text aspect-viewpoint pair extraction method based on context enhancement
CN115204143B (en) Method and system for calculating text similarity based on prompt
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant