CN112417100A - Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof - Google Patents

Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof Download PDF

Info

Publication number
CN112417100A
CN112417100A CN202011313409.0A CN202011313409A CN112417100A CN 112417100 A CN112417100 A CN 112417100A CN 202011313409 A CN202011313409 A CN 202011313409A CN 112417100 A CN112417100 A CN 112417100A
Authority
CN
China
Prior art keywords
question
knowledge
constructing
liaoning
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011313409.0A
Other languages
Chinese (zh)
Inventor
刘爽
谭楠楠
孟佳娜
于玉海
赵丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Minzu University filed Critical Dalian Minzu University
Priority to CN202011313409.0A priority Critical patent/CN112417100A/en
Publication of CN112417100A publication Critical patent/CN112417100A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of artificial intelligent question answering, and discloses a knowledge graph in the field of Liaodai historical culture and a construction method of an intelligent question answering system thereof. The technical scheme is as follows: carrying out map design according to the category of the entity; acquiring corresponding data according to the map design; processing the original corpus; carrying out named entity recognition and relation extraction on the processed original corpus; constructing a knowledge graph; has the advantages that: the knowledge graph in the Liaoning generation historical culture field and the construction method of the intelligent question-answering system thereof can effectively integrate the scattered knowledge by constructing the knowledge graph in the Liaoning generation historical culture, and are convenient to be widely applied to various industries to promote the propagation of related culture information; the Liaoning generation historical culture intelligent question-answering system is constructed by exploring a knowledge graph in practical application, and the practice is favorable for improving the retrieval efficiency of the knowledge in the field for users.

Description

Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof
Technical Field
The invention belongs to the field of artificial intelligence based question answering, and particularly relates to a method for constructing a knowledge graph in the historical culture field of the Liaoning generation and a method for constructing an intelligent question answering system based on the knowledge graph in the historical culture field of the Liaoning generation.
Background
The Liaochao is a frontier dynasty which is mainly established by a kingcard noble of a minority nationality in the ancient northern China. In the governing period of the Liaochao, the cultures of different ethnic groups are conflicted with each other, so that the politics, economy, thought, culture and the like of the Liaochao are full of diversified colors. Meanwhile, the establishment of the Liaochong dynasty promotes the history development process of China, promotes the nationality fusion, and has a lot of scientific civilization to use up to now. With the arrival of big data in the internet era, more and more historical culture knowledge is displayed through websites such as various encyclopedia platforms and historical culture. How to extract the knowledge needed by the user from the massive data becomes a key problem of the current analysis. The knowledge map technology can extract massive redundant data systems into structured knowledge, and is widely applied to the reality of intelligent search, question-answering systems, recommendation systems and the like.
The knowledge graph technology was proposed by Google in 2012, and its original purpose is to improve the search ability of search engine and provide high-quality search experience for users. The knowledge graph is a semantic net in nature, and reasonably arranges huge and scattered knowledge together. The combination of the knowledge graph and the question-answering system can enable a user to intuitively know the relevant knowledge in the field. Large-scale general domain knowledge bases exist in the market at present, and Freebase, Wikidata, DBpedia and the like become main sources of knowledge map data. However, the data about the history of the Liao generation is relatively less, and the knowledge map based on the vertical field of the history of the Liao generation is much less and less. Therefore, the intelligent question-answering system based on the knowledge graph in the Liaoning generation historical culture field is important for historical researchers.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a knowledge graph-based Liaoning generation historical culture intelligent question-answering method, which can store and express Liaoning generation historical culture knowledge in a knowledge graph form; a question may be entered in natural language, the corresponding answer retrieved from the knowledge base, and returned to the user in natural language. The method and the device provide convenience for the user to acquire knowledge, so that the user can acquire required information more accurately and quickly. The technical scheme is as follows:
a method for constructing a knowledge graph in the historical culture field of Liaoning generations comprises the following steps:
step 1: carrying out map design according to the category of the entity;
step 2: acquiring corresponding data according to the map design;
and step 3: processing the original corpus;
and 4, step 4: carrying out named entity recognition and relation extraction on the processed original corpus;
and 5: and (5) constructing a knowledge graph.
Further, for step 1, the entity categories include: chinese name, alternative name, city, historical figure, language, ethnic group, military deployment, implementation system, artistic form, science and technology, foreign exchange, clothing, hair ornament, business exchange, population quantity, religion, folk custom and building, wherein each entity category comprises a plurality of entities; and 2, acquiring related structured data, semi-structured data and unstructured data from encyclopedia websites, related books and historical websites through a web crawler according to the map design.
Further, aiming at the step 3, a jieba word segmentation tool is used for carrying out word segmentation and part-of-speech tagging on the data, and punctuation marks and stop words are removed.
Further, aiming at the step 4, the obtained semi-structured data is integrally stored, a deep learning method is used for carrying out entity recognition and relation extraction on the unstructured data, and then the obtained data is subjected to knowledge fusion.
Further, the data sorted in step 4 in step 5 is stored by using Neo4 j.
The invention also comprises a construction method of the intelligent question-answering system based on the knowledge graph in the Liaoning generation historical culture field, which comprises the following steps:
step 1: carrying out named entity recognition on a natural language question input by a user;
step 2: performing question intention identification on the question;
and step 3: searching the knowledge base answers and returning the answers;
and 4, step 4: constructing a question-answer library;
and 5: and carrying out deep semantic matching on the question-answer library to generate a return answer.
Further, in step 1, after the question sentence input by the user is preprocessed, entity recognition is performed by a deep learning method.
Further, for step 2, question intent is identified by textCNN convolutional neural network.
Further, aiming at the step 3, the entity obtained in the step 1 and the relationship or attribute obtained in the step 2 are used for constructing a query statement by using a cypher statement, and the query statement is used for searching an answer in a Neo4j graph database.
Further, aiming at the step 4, if the corresponding triples are not inquired in the step 3, the relevant question-answer websites and forums are crawled through a crawler method, and answers which are ranked 2 before the number of the screened answers of a question and have earlier answer time are stored in a question-answer library; and (5) performing deep semantic matching on the question-answer library by using a twin network to construct an answer.
Has the advantages that:
the knowledge graph in the Liaoning generation historical culture field and the construction method of the intelligent question-answering system thereof can effectively integrate the scattered knowledge by constructing the knowledge graph in the Liaoning generation historical culture, and are convenient to be widely applied to various industries to promote the propagation of related culture information; the Liaoning generation historical culture intelligent question-answering system is constructed by exploring a knowledge graph in practical application, and the practice is favorable for improving the retrieval efficiency of the knowledge in the field for users.
Drawings
FIG. 1 is an overall structural view of the present invention;
FIG. 2 is a flow chart of the intelligent question answering system of the present invention;
FIG. 3 is a diagram of a named entity identification BilSTM-CRF network structure according to the present invention;
FIG. 4 is a diagram of a network structure of the textCNN for question recognition in the present invention;
FIG. 5 is a diagram of the structure of a twin LSTM-MatchPyramid model according to the present invention;
FIG. 6 is a database visualization effect of FIG. 1 in accordance with the present invention;
FIG. 7 is a database visualization effect of the present invention FIG. 2;
FIG. 8 is a diagram illustrating the visualization effect of a web page according to the present invention.
Detailed Description
The specific operation steps of the knowledge-graph-based Liaoning generation historical culture intelligent question-answering system construction method will be described in more detail with reference to the attached drawings.
The embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The invention mainly comprises the construction of two modules:
a first module: constructing a knowledge graph in the historical culture field of the Liaoning generation;
and a second module: constructing an intelligent question-answering system;
for the first module, a method for constructing the knowledge graph in the historical culture field of the Liaoning generation is provided, and the overall structure of the knowledge graph is shown in FIG. 1. Designing the knowledge graph of the history culture of the Liaoning generation according to the requirement. And data are obtained through a web crawler technology, and the corresponding data are processed and extracted by a certain method and then stored into a Neo4j database. Each step will be described in detail below.
Step 1: atlas design
The step is the most critical step for constructing the map of the corresponding field. Through the understanding and analysis of the Liaodai historical culture, the invention designs the entity categories of the atlas in the field, which respectively comprise: chinese names, alternative names, city, historical figures, languages, nationalities, military deployments, implementation systems, artistic forms, science and technology, foreign exchange, clothes, hair ornaments, business exchange, population quantity, religion, folk custom, buildings, territorial division and funeral taboos, wherein each entity category comprises a plurality of entities. Most of the linguistic data used in the invention are from web articles and historical books because of less historical structured data. It is difficult to guarantee full coverage by manually defining entity types in advance. Therefore, the titles of the network articles and the catalogs of the historical books are summarized as entity categories. Corresponding attribute information is defined for each entity according to historical characteristics to represent its inherent meaning. Characters and numbers are set in the historical characters, and the characters have the attributes of characters such as official posts, jazz and the like when the characters are marked on the numbers. Relationships are defined to delineate the connections between each entity and the entity or attribute, such as triples (Yersinia Authority, ethnicity, Chadan nationality) may be created between historical personalities and ethnicities.
Step 2: acquiring corresponding data
According to the map design, structured data such as Baidu encyclopedia and historical websites, semi-structured data and unstructured data such as Liaoshu textbooks and network articles are crawled by using a crawler technology.
And step 3: processing original corpus
And carrying out treatments such as word removal, special symbol deletion, repeated word deletion and the like on the obtained original corpus, and carrying out word segmentation on the collected data by using a jieba word segmentation tool and a custom dictionary formulated according to Liaoning generational characteristics. According to rules such as the naming habits of the Liaoning minority nationalities, the regular expressions are used for keeping the longest characters before and after the stop words, and the longest characters are manually screened and stored in a user-defined dictionary. For example, the word segmentation result of "yersinia malloti and yersinia malloti ancient created words" when the custom dictionary is not used is "yersinia malloti/non and/yersinia malloti ancient/created/words", and the word segmentation result of "yersinia malloti/and/or yersinia malloti ancient/created/words" when the custom dictionary is not used. If the traditional Chinese characters exist in the unstructured corpus, the traditional Chinese characters are extracted by using a hundred-degree character extraction tool, and then the traditional Chinese characters are converted by using words. And then manually screening the data after word segmentation, and executing the step 4 after ensuring the accuracy of the data.
And 4, step 4: named entity identification and relationship extraction
And (4) respectively processing the data processed in the step (3). The structured data are sorted and stored, the semi-structured data are extracted manually, and the unstructured data are extracted by adopting a deep learning model ALBERT-BilSTM-CRF after being labeled with linguistic data.
And 5: building knowledge graph
Importing the data in the step 4 into a Neo4j database by using a cypher statement.
For the second module, a method for constructing an intelligent question-answering system based on the knowledge graph in the historical culture field of the Liaoning generation is provided, which comprises the following steps:
step 1: named entity recognition for user-entered natural language question
Firstly, after preprocessing a question sentence input by a user, training the question sentence into a word vector. And then carrying out entity identification through a BilSTM-CRF model.
Step 2: question and sentence intent recognition
And converting the identification of the intention of the question into a relation classification problem, wherein the extracted relation of the question is an entity or attribute in the triple. In general, a question is generally a single-hop problem, which mostly exists in a short text form, so that the TextCNN is adopted for classifying the question in the project.
And step 3: searching knowledge base answers and returning answers
And (3) querying the entities and the relations obtained in the step (1) and the step (2) in a Neo4j graph database by using a cypher language, and constructing answers and returning after finding corresponding entities or attribute values. If the data in the knowledge base is not queried, step 4 is executed.
And 4, step 4: construct question-answer library
When crawling relevant question and answer websites and forums in the professional field, screening question and answer pairs with high accuracy rate, preprocessing the question and answer pairs and using the preprocessed question and answer pairs as a question and answer library.
And 5: deep semantic matching is carried out on the question-answer library in the step 4, and a return answer is generated
And when the answer is not inquired, performing deep semantic matching on the constructed question-answer library by using a twin network and an interaction matrix, and returning the answer with the highest score to the user.
Example 2
As shown in fig. 1, a method for constructing a knowledge-graph question-answering system in the historical culture field of the Liaoning generation is mainly constructed from five aspects.
Step 1: designing a Liaodai historical culture map;
step 2: acquiring data in the Liaoning generation historical culture field;
and step 3: performing knowledge extraction and fusion on the field data;
and 4, step 4: constructing a knowledge graph;
and 5: a question-answering system for realizing knowledge in the history culture field of the Liaoning generation;
each step is described in detail below:
step 1: according to the systematic analysis of the history information of the Liaoning generation, starting from encyclopedia websites and related history websites, the entity type, entity relationship and entity attributes in the knowledge map are determined. For example, defining historical people class, the class entity contains historical emperor and people in every field named in the dynasty. The biographical traces of each person are used as attributes for describing the entity, and a corresponding relation is established to reflect the relation between the historical person entity and other entities.
Step 2: data sources obtained by crawling the large network stations are mainly classified into three types: structured data, semi-structured data, unstructured data.
And step 3: and respectively extracting and fusing data in different storage forms.
For structured data, it is saved to a list after it is acquired.
And for the semi-structured data, performing xpath analysis on the webpage structures of the encyclopedic website and the historical website, and capturing corresponding knowledge of the webpage by using a script crawler frame.
For unstructured data, crawled network articles and Liaoshi textbooks are large sections of text data. So that it needs to be named entity identification to extract the required entities. In the project, a joint learning model ALBERT-BilSTM-CRF is adopted to extract entities in a specific field. The method mainly comprises the following steps:
the method comprises the following steps: using a jieba word segmentation tool and a custom dictionary to segment the collected data and stop words; and adding the result of incorrect word segmentation into the user-defined dictionary after word segmentation.
The method comprises the following steps: and pre-training by using the constructed corpus, adopting a marking data format as a BIO marking mode, and marking each element into one of forms (B-XX, I-XX and O-XX). Where B denotes the beginning and XX denotes the defined element class; i represents the middle; o denotes others for marking irrelevant characters.
Step three: the model uses a pre-training model BERT to generate word vectors about context information, the trained word vectors are used as the input of a BilSTM layer, the front-back semantic relation of each word is obtained, and finally the word vectors are sent to a CRF layer to further ensure the accuracy of sequence labeling.
Step four: and linking and fusing the extracted entities and the extracted relations.
And 4, step 4: the triplets are stored in a Neo4j database.
And 5: as shown in fig. 2, the construction steps of the intelligent question-answering system in the history culture field of the Liaoning generation include:
step [1 ]: carrying out named entity recognition on the natural language question;
step [2 ]: identifying the intention of the question;
step [3 ]: searching the knowledge base answers and returning the answers;
step [4 ]: constructing a question-answer library;
step [5 ]: carrying out deep semantic matching on the question-answer library to generate a return answer;
step [1 ]: when the question is subjected to entity recognition, a BilSTM-CRF model is used, and after data processing operations such as word segmentation, stop word removal and the like are firstly carried out on the question, the question is used as the input of a BilSTM layer in the entity recognition model by utilizing a word embedding technology. The project is trained by using Skip-gram model training of a word2vec tool. The model is shown in figure 3.
Model training is described as follows:
1) embedding layer: before entity recognition, word vector pre-training is required to be carried out, and a basis is provided for an embedding layer. The project adopts training based on word vectors. Firstly, after an input question is subjected to word segmentation and word stop by using a jieba word segmentation tool, word vector pre-training is carried out by using a word2vec tool in genim, the dimension of the word vector is set to be 300 dimensions, and the window size is set to be 5.
2) BilsTM layer: the bidirectional LSTM model formed by combining the forward LSTM and the backward LSTM can effectively solve the problem of long-term dependence and better capture bidirectional semantic information. Hidden layer in LSTM model is composed of forgetting gate ftMemory gate itAnd an output gate otComposition ftInformation that needs to be forgotten, itThe ratio of the information to be memorized is determined. The BilSTM consists of two layers of LSTMs in different directions, and after operation, respective prediction results are spliced, and then the splicing result is used as the input of the next CRF layer.
3) CRF layer: the conditional random field is a sequence labeling model, and the conditional random field trains a CRF model after receiving the label score output by the BilSTM layer, so that corresponding probability distribution and weight values can be obtained. Through the constraint of the CRF layer on the output sequence, the error information output by the BilSTM layer can be well avoided.
Step [2 ]: the intent of the question is the relationship or attribute in the triplet, and thus the identification of the intent of the question is the identification of the second element in the triplet. According to analysis, the question input by the user usually exists in a short text form, so the item takes the identification of the intention of the question as a short text classification problem to solve. In general, short text message structures are less organized and question intent can be determined by local features. The question is classified using a convolutional neural network. The model is shown in fig. 4.
Model training is described as follows:
a) embedding layer: the Skip-gram training word vector of the word2vec tool is used as input to the embedding layer.
The sentence matrix is d x h, where d is the dimension of the word vector and h is the length of the sentence.
b) And (3) rolling layers: in the TextCNN model, there are one-dimensional convolution layers with convolution kernel size (2,3,4) for extracting different text features. The features obtained after convolution are
Ci=f(w·xi:i+h-1+ b) formula (1)
Wherein f is an activation function, w is a weight matrix with dimensions of h multiplied by k, h represents the number of words in the window, xi:i+h-1Representing a window of size h x k, consisting of the i-th to i + h-1-th rows of the input matrix, b representing the bias parameter, all CiThe characteristic information obtained for the convolutional layer is composed.
c) A pooling layer: the project uses max-pooling to preserve the most important features in feature map, turning them into one-dimensional vectors. The operation has the effect of reducing the dimension, reduces the number of parameters and the calculation amount, and is beneficial to reducing the risk of overfitting.
d) Full connection layer: and splicing results of the pooling layers to be used as input of a full connection layer, adding the hidden layer and the softmax layer to serve as a classifier, and classifying the question sentences.
Step [3 ]: the format of the triples in the knowledge-graph is (head entity, relation, tail entity) or (entity, attribute value). Wherein the head entity is obtained according to the named entity identification in the step [1], and the relationship or the attribute is obtained according to the question intention identification in the step [2 ]. Knowing these two elements, the cypher language can look up their corresponding tail entity or attribute values in the Neo4j graph database and then return the answer. If no triple is found, i.e. no answer is retrieved, step [4] is performed.
The partial cypher statement is as follows:
1. description of historical figures and ethnicities: match (p: Person) - [ r: Relation ] - > (m: nature) where p.name $ name return p.name, r.name, m.name
2. Inquiring the historical person alternative names: match (p: Person) where p.name ═ name return p.name, p.Pname
The non-visualized question-answer results were as follows:
a. and (3) user input: who is the emperor of the opening country in the Liaoning generation?
Department of Liaosheng encyclopedia: jewel machine
b. And (3) user input: is there a knowledge about the royal court of great Liao?
Department of Liaosheng encyclopedia: the Liaochao (907-1125 B.C. of the great Yuan) is a sealed dynasty established in northern China by the kingdom nationality in China historically. In 916, the Liaotai ancestor Bao Jiu Union Qidan each called sweat, the national number "Qidan" was destined to be applied to the house of decoration (the city of Nanpo south flag Balin Haofeng city, Mongolia). In 947, the Liaotaizong rate military was in the south China, and in five generations, the country was changed to Liao, and in 1066, the country was changed to Daao. In 1125 B.D., Liaochao was killed by Jinchao. From east to the sea of Japan, from West to Altai mountain, from North to the Qualcuna river, from Daxing AnLing, from south to the white gutter of the south of Hebei province in Liao province. The lead is a nomadic ethnic group, and the emperor in Liaochao makes the agriculture and animal husbandry develop and prosper together, and establishes a unique and relatively complete management system. The Liaoning dynasty places the center of gravity on the nationality for development, initiates a political system of two courtyards, creates the Qidan characters and preserves own culture.
c. And (3) user input: what are the main nationalities of the Liaoning generation?
Department of Liaosheng encyclopedia: the main nationalities of Liaoning Dynasty include Qidan nationality, xi nationality, Han nationality, Zhibo, Nuzhen and Chamber Wen
Step [4 ]: and crawling the question sentences, answers, answer time and praise number in the question and answer pages of the question and answer websites, forums and historical websites by adopting a crawler technology. And screening the crawled data. Usually, when a user browses a page, the user likes a comment about the left message, so that for questions of one question and multiple questions, the answer of 3 ranking to the number of the comments is selected according to the judgment of the answer time and the number of the comments (if the number of the comments is the same, the answer time is considered to be earlier), and the answer is stored in a file. And extracting the subject of the question and the sentence as a title to be stored by using the deep learning model. The final question-answer library is in the form of (title, question, answer, number of praise).
Step [5 ]: and when the relevant answers cannot be retrieved in the knowledge base, retrieving the question sentence input by the user by adopting the constructed question-answer base. The search process needs to semantically match the question input by the user with the question in the question-answer library. When the question-answer library is searched, the title bar in the question-answer library is searched aiming at the relation identified by the question and sentence intention, so that the question-answer library can be quickly positioned to a specific position in the question-answer library, and the method effectively solves the problem of increasing parameters and calculation amount for directly carrying out semantic matching between the question sentences. And then modeling the semantic similarity between the question input by the user and the question in the question-answer library by using a deep learning model, wherein the question-answer library has a question-answer condition, and if the semantic similarity between the question and the answer is the same, adopting answers with high praise number and returning the answers to the user.
Selecting a model: the current factors for semantic matching and considering sentence pairs by using a deep learning method are as follows: 1. semantic difference between two sentences, 2. association between two sentences. Faced with the first problem, the twin network can effectively consider the context information of sentences, but ignore the connection between two sentences; facing the second problem, the MatchPyramid neural network focuses more on the relevance between two sentences. By combining the two, more abundant characteristic information can be extracted, and the situation is avoided. Therefore, a method of combining twin LSTM and MatchPyramid is adopted for text semantic matching. The model is shown in fig. 5.
The twin LSTM-MatchPyramid combined model mainly comprises an embedding layer, a feature extraction layer, a full connection layer and an output result.
(1) Embedding layer: firstly, a jieba word segmentation tool is used for segmenting words of a question, and then word2vec vector initialization is carried out on the input user question and words in the question of a question and answer library to be used as input of an input layer.
(2) A feature extraction layer: in the twin LSTM, the weights of the two LSTM models are set to be the same, and the pooling layer is adopted to obtain the characteristic information of the two sentences. In a MatchPyramid model, a word-level similarity matrix between two sentences is constructed, the dimension of the similarity matrix is M x N, M and N are the lengths of the two sentences, and the similarity matrix is formed by dot multiplication of word vectors; and extracting features by using two layers of CNNs, wherein the first layer of convolution kernel calculates the features of the two sentences respectively, and the second layer of convolution kernel sums the two sentences.
(3) Full connection layer and output result: after the features are extracted, the three features are spliced and input into a full-connection network and a cross entropy loss function to classify the three features and output results. The loss function is formulated as
Li=-[yi·log(pi)+(1-yi)log(1-pi)]Formula (2)
Wherein y isiDenotes a sample tag, piRepresenting the classification probability of the sentence.
The final result graph after the visualization operation is shown in fig. 8.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (10)

1. A method for constructing a knowledge graph in the historical culture field of Liaoning generations is characterized by comprising the following steps:
step 1: carrying out map design according to the category of the entity;
step 2: acquiring corresponding data according to the map design;
and step 3: processing the original corpus;
and 4, step 4: carrying out named entity recognition and relation extraction on the processed original corpus;
and 5: and (5) constructing a knowledge graph.
2. The method of constructing a knowledge-graph of the Liaoning Generation historical culture Domain of claim 1, wherein for step 1, the entity categories comprise: chinese names, alternative names, city cities, historical figures, languages, nationalities, military deployments, implementation systems, artistic forms, science and technology, foreign exchange, clothes, hair ornaments, business exchange, population quantity, religions, folk customs and buildings, wherein each entity type comprises a plurality of entities, and each entity comprises corresponding attribute information for describing the inherent characteristics of the entity; and 2, acquiring related structured data, semi-structured data and unstructured data from encyclopedia websites, related books and historical websites through a web crawler according to the map design.
3. The method for constructing knowledge-graph of historical culture field of Liaoning generation as claimed in claim 2, wherein for step 3, word segmentation and part of speech tagging are performed on data by using a jieba word segmentation tool, and punctuation marks and stop words are removed.
4. The method for constructing knowledge graph in historical culture field of Liaoning generation as claimed in claim 3, wherein, aiming at step 4, the obtained semi-structured data is stored after being integrated, the deep learning method is used for entity recognition and relationship extraction of the unstructured data, and then the obtained data is subjected to knowledge fusion.
5. The method for constructing knowledge-graph of historical culture domain in Liaoning generation as claimed in claim 4, wherein the data sorted in step 4 in step 5 is stored by using Neo4 j.
6. A method for constructing an intelligent question-answering system based on a knowledge graph in the historical culture field of Liaoning generation is characterized by comprising the following steps:
step 1: carrying out named entity recognition on a natural language question input by a user;
step 2: performing question intention identification on the question;
and step 3: searching the knowledge base answers and returning the answers;
and 4, step 4: constructing a question-answer library;
and 5: and carrying out deep semantic matching on the question-answer library to generate a return answer.
7. The method for constructing an intelligent question-answering system based on knowledge graph of historical culture field of Liaoning generation as claimed in claim 6, wherein aiming at step 1, after the question sentence input by the user is preprocessed, the entity recognition is performed by deep learning method.
8. The method of claim 7, wherein for step 2, question intent is identified by textCNN convolutional neural network.
9. The method for constructing an intelligent question-answering system based on knowledge graph of historical culture field of Liao generation as claimed in claim 8, wherein aiming at step 3, the entity obtained in step 1 and the relationship or attribute obtained in step 2 are used to construct query sentence by cypher sentence, which is used to search for answer in Neo4j graph database.
10. The method for constructing an intelligent question-answering system based on knowledge graph of historical culture field of Liao generation as claimed in claim 9, wherein aiming at step 4, if no corresponding triple is queried in step 3, the relevant question-answering website and forum are crawled by crawler method, and answers with rank 2 before the number of the screened answers with one question and earlier answer time are stored in the question-answering library; and (5) performing deep semantic matching on the question-answer library by using a twin network to construct an answer.
CN202011313409.0A 2020-11-20 2020-11-20 Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof Pending CN112417100A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011313409.0A CN112417100A (en) 2020-11-20 2020-11-20 Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011313409.0A CN112417100A (en) 2020-11-20 2020-11-20 Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof

Publications (1)

Publication Number Publication Date
CN112417100A true CN112417100A (en) 2021-02-26

Family

ID=74777099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011313409.0A Pending CN112417100A (en) 2020-11-20 2020-11-20 Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof

Country Status (1)

Country Link
CN (1) CN112417100A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128233A (en) * 2021-05-11 2021-07-16 济南大学 Construction method and system of mental disease knowledge map
CN113297089A (en) * 2021-06-09 2021-08-24 南京大学 Crowd-sourcing assistant implementation method based on knowledge graph
CN113468304A (en) * 2021-06-28 2021-10-01 哈尔滨工程大学 Construction method of ship berthing knowledge question-answering query system based on knowledge graph
CN114238653A (en) * 2021-12-08 2022-03-25 华东师范大学 Method for establishing, complementing and intelligently asking and answering knowledge graph of programming education
CN115357693A (en) * 2022-07-12 2022-11-18 浙江中控技术股份有限公司 Method for constructing intelligent question-answering system based on knowledge graph of hydrocracking device
CN115599902A (en) * 2022-12-15 2023-01-13 西南石油大学(Cn) Oil-gas encyclopedia question-answering method and system based on knowledge graph
CN117059261A (en) * 2023-08-21 2023-11-14 安徽农业大学 Livestock and poultry disease diagnosis method and system based on multi-mode knowledge graph
CN114238653B (en) * 2021-12-08 2024-05-24 华东师范大学 Method for constructing programming education knowledge graph, completing and intelligently asking and answering

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN107918634A (en) * 2017-06-27 2018-04-17 上海壹账通金融科技有限公司 Intelligent answer method, apparatus and computer-readable recording medium
CN109815340A (en) * 2019-01-17 2019-05-28 云南师范大学 A kind of construction method of national culture information resources knowledge mapping
CN110399496A (en) * 2019-07-02 2019-11-01 厦门耐特源码信息科技有限公司 A kind of knowledge mapping construction method based on CR decision tree
CN110569345A (en) * 2019-09-04 2019-12-13 淮阴工学院 Intelligent question-answering method for real-time knowledge based on entity link and relation prediction
CN111324691A (en) * 2020-01-06 2020-06-23 大连民族大学 Intelligent question-answering method for minority nationality field based on knowledge graph

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN107918634A (en) * 2017-06-27 2018-04-17 上海壹账通金融科技有限公司 Intelligent answer method, apparatus and computer-readable recording medium
CN109815340A (en) * 2019-01-17 2019-05-28 云南师范大学 A kind of construction method of national culture information resources knowledge mapping
CN110399496A (en) * 2019-07-02 2019-11-01 厦门耐特源码信息科技有限公司 A kind of knowledge mapping construction method based on CR decision tree
CN110569345A (en) * 2019-09-04 2019-12-13 淮阴工学院 Intelligent question-answering method for real-time knowledge based on entity link and relation prediction
CN111324691A (en) * 2020-01-06 2020-06-23 大连民族大学 Intelligent question-answering method for minority nationality field based on knowledge graph

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128233A (en) * 2021-05-11 2021-07-16 济南大学 Construction method and system of mental disease knowledge map
CN113297089A (en) * 2021-06-09 2021-08-24 南京大学 Crowd-sourcing assistant implementation method based on knowledge graph
CN113297089B (en) * 2021-06-09 2023-06-20 南京大学 Knowledge graph-based mass measurement assistant implementation method
CN113468304A (en) * 2021-06-28 2021-10-01 哈尔滨工程大学 Construction method of ship berthing knowledge question-answering query system based on knowledge graph
CN114238653A (en) * 2021-12-08 2022-03-25 华东师范大学 Method for establishing, complementing and intelligently asking and answering knowledge graph of programming education
CN114238653B (en) * 2021-12-08 2024-05-24 华东师范大学 Method for constructing programming education knowledge graph, completing and intelligently asking and answering
CN115357693A (en) * 2022-07-12 2022-11-18 浙江中控技术股份有限公司 Method for constructing intelligent question-answering system based on knowledge graph of hydrocracking device
CN115599902A (en) * 2022-12-15 2023-01-13 西南石油大学(Cn) Oil-gas encyclopedia question-answering method and system based on knowledge graph
CN117059261A (en) * 2023-08-21 2023-11-14 安徽农业大学 Livestock and poultry disease diagnosis method and system based on multi-mode knowledge graph

Similar Documents

Publication Publication Date Title
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN110399457B (en) Intelligent question answering method and system
CN112417100A (en) Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof
CN111353030B (en) Knowledge question and answer retrieval method and device based on knowledge graph in travel field
Park et al. ConceptVector: Text visual analytics via interactive lexicon building using word embedding
Abello et al. Computational folkloristics
Hassan et al. Sentiment analysis on bangla and romanized bangla text using deep recurrent models
CN109471949B (en) Semi-automatic construction method of pet knowledge graph
CN107220237A (en) A kind of method of business entity's Relation extraction based on convolutional neural networks
CN109829104A (en) Pseudo-linear filter model information search method and system based on semantic similarity
CN110674252A (en) High-precision semantic search system for judicial domain
CN107871158A (en) A kind of knowledge mapping of binding sequence text message represents learning method and device
CN103440287A (en) Web question-answering retrieval system based on product information structuring
CN113535917A (en) Intelligent question-answering method and system based on travel knowledge map
CN111324691A (en) Intelligent question-answering method for minority nationality field based on knowledge graph
CN106484797A (en) Accident summary abstracting method based on sparse study
CN108021715B (en) Heterogeneous label fusion system based on semantic structure feature analysis
CN111143574A (en) Query and visualization system construction method based on minority culture knowledge graph
CN112784602A (en) News emotion entity extraction method based on remote supervision
CN114780740A (en) Construction method of tea knowledge graph
CN112989811B (en) History book reading auxiliary system based on BiLSTM-CRF and control method thereof
JP4931114B2 (en) Data display device, data display method, and data display program
CN107908749A (en) A kind of personage's searching system and method based on search engine
CN111859887A (en) Scientific and technological news automatic writing system based on deep learning
CN111177411A (en) Knowledge graph construction method based on NLP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination