CN116304110A

CN116304110A - Working method for constructing knowledge graph by using English vocabulary data

Info

Publication number: CN116304110A
Application number: CN202310336495.4A
Authority: CN
Inventors: 邓淄予
Original assignee: Chongqing Industry Polytechnic College
Current assignee: Chongqing Industry Polytechnic College
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-06-23
Anticipated expiration: 2043-03-30
Also published as: CN116304110B

Abstract

The invention discloses a working method for constructing a knowledge graph by using English vocabulary data, which comprises the following steps: s1, executing a collection process of English documents according to query keywords, extracting a plurality of English vocabulary keywords, performing data convergence to form a keyword English document data set, and performing similarity judgment; s2, performing a grouping process on the data set after similarity judgment, performing derivative classification operation on English documents containing keywords with different attributes, and performing relation mapping according to derivative classification content; and S3, carrying out threshold judgment calculation on the derived and segmented keyword English literature, and thus constructing attribute information corresponding to the corresponding keyword English literature.

Description

Working method for constructing knowledge graph by using English vocabulary data

Technical Field

The invention relates to the field of big data analysis, in particular to a working method for constructing a knowledge graph by using English vocabulary data.

Background

In the technological research and development process, a large amount of English documents in related fields are required to be acquired, the construction process of knowledge content is performed, the English articles are acquired, indexes can be collected for corresponding English key words or core words, and index summarized English word contact patterns are provided, the prior art uses BPE word segmentation processing, only English documents in the corresponding technological fields are subjected to simple attribute description, the deep knowledge pattern construction process cannot be performed, the knowledge structure drawing work of the fine mass English data cannot be completed, and therefore the technical problem of the corresponding technical problem is needed to be solved by a person in the art.

Disclosure of Invention

The invention aims at least solving the technical problems in the prior art, and particularly creatively provides a working method for constructing a knowledge graph by using English vocabulary data.

In order to achieve the above object of the present invention, the present invention provides a working method for constructing a knowledge graph using english vocabulary data, comprising the steps of:

s1, executing a collection process of English documents according to query keywords, extracting a plurality of English vocabulary keywords, performing data convergence to form a keyword English document data set, and performing similarity judgment;

s2, performing a grouping process on the data set after similarity judgment, performing derivative classification operation on English documents containing keywords with different attributes, and performing relation mapping according to derivative classification content;

and S3, carrying out threshold judgment calculation on the derived and segmented keyword English literature, and thus constructing attribute information corresponding to the corresponding keyword English literature.

Preferably, in the above technical solution, the S1 includes:

s1-1, setting corresponding query keywords according to user use requirements, establishing a query keyword log, acquiring English documents of the keywords, and initializing English documents generated by the query keywords;

s1-2, constructing a collection model R of the keyword English literature,

where m keywords are set, where the number of keyword english literature data sets r is j, x is english literature category, y is english literature attribute, μ is weight trained in the m-th keyword, and S is a keyword english literature data value having dual characteristics of category and attribute.

Preferably, in the above technical solution, the S1 further includes:

s1-3, after the collection of the data set is completed, carrying out similarity pairwise comparison on the keyword English documents, and comparing the collected keyword English documents with preset reference documents;

for keyword english literature similarity calculation, by gradually converging to Y trained keywords during training of the collection model R, a similarity influence weight λ needs to be defined first, where z (s, d) is the number of occurrences of a keyword s in the keyword english literature in the english literature d:

preferably, in the above technical solution, the S1 further includes:

X _s keyword probability for preset reference X, Y _s Calculating corresponding similarity influence weights for keyword probabilities of the keyword English documents, wherein i is the number sequence number of all keywords in the keyword English documents;

the similarity T between the keyword english literature and the preset reference is calculated as follows:

wherein D is a similarity factor, N is the most used number of keywords in English literature, the minimum value of the number is 1, and the maximum value of the number of the keywords in English literature; m(s) _i ) The method comprises the steps of defining corresponding Machine learning and Deep learning as English documents with similarity, wherein the similarity is the similarity of keywords used in English documents and preset references; opera, operata, draga

Wherein the method comprises the steps of

s' is the total number of keywords in the preset reference.

Preferably, in the above technical solution, the S2 includes:

s2-1, classifying the attributes of the keyword English documents according to the similarity T, forming a complete sample vector by calculating the associated information of the keyword English documents with different attributes, carrying out set classification by adopting a weak classifier, and carrying out attribute classification after optimization according to the classifier weight in the keyword English documents;

s2-2, said attribute is classified as

η _t In order to refer to the association coefficient of the sequence keyword English literature according to the time t, H is an attribute characteristic value, when H is 0, the negative keyword is represented, when H is 1, the positive keyword is represented, when H is other values are commonality or neutral keywords, L is a keyword English literature data value with attribute characteristics, W is a keyword English literature data value with attribute characteristics _k For the total number of the English documents of the keywords with the category and the attribute of the similarity in the classifier, the subscript k is the data value sequence number of the English document of the keyword with the attribute characteristic.

Preferably, in the above technical solution, the S3 includes:

the keyword English literature after attribute classification is subjected to threshold judgment, so that attribute mapping operation is performed on the keyword English literature after similarity judgment,

wherein beta is _k A threshold value of a keyword English literature data value k which is an attribute characteristic;

the average value of the thresholds of the English literature of the keywords of all attribute characteristics in the training data; epsilon, (0 < epsilon < 1) represents the attribute correlation judgment coefficient。

In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:

according to the invention, a BPE word segmentation method is abandoned, an efficient similarity judgment model is used for training the original English literature, sample errors in the training process are removed, a screened English literature training set is formed, topic classification is carried out according to the attribute of the part of speech of English literature semantics, the topic attribute of the corresponding English literature is judged, classification operation is carried out according to the topic attribute, threshold judgment is carried out on the original English literature and the topic classified English literature, and the process forms the knowledge graph construction of the English literature.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a general schematic of the present invention;

FIG. 2 is a schematic diagram of an embodiment of the present invention;

fig. 3 is a schematic diagram of another embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

As shown in fig. 1 to 3, the invention discloses a working method for constructing a knowledge graph by using english vocabulary data, comprising the following steps:

Preferably, in the above technical solution, the S1 includes:

s1-2, constructing a collection model R of the keyword English literature,

wherein, by setting m keywords, the number of the keyword English literature data sets r is j, x is English literature category, y is English literature attribute, mu is weight trained when the m-th keyword, S is keyword English literature data value with dual characteristics of category and attribute,

training 20 rounds of training are performed by initializing training to be set to 0.1, matching a keyword English literature set of 5%, and training English literature is performed by using a similarity optimizer; according to english vocabulary keywords, for example: word processing using AI intelligence according to the software domain, wherein Machine learning, deep learning, neural networks, artificial intelligence (artificial intelligence), natural language processing (natural language processing), computer vision, reinforcement learning (reinforcement learning), supervised learning (supervised learning), unsupervised learning (unsupervised learning), convolutional Neural networks (convolutional Neural network), recurrent Neural networks (recurrent Neural network), transfer learning (transfer learning), generative adversarial networks (generation countermeasure network), bayesian networks (bayesian network), support vector machines (support vector Machine) Decision trees, clustering, regression analysis (regression analysis), data mining, big Data;

extracting the English literature of the keywords, training and classifying the English literature into a keyword English literature data set, and analyzing the English literature similarity according to the keyword similarity;

wherein X is _s Keyword probability for preset reference X, Y _s Calculating corresponding similarity influence weights for keyword probabilities of the keyword English documents, wherein i is the number sequence number of all keywords in the keyword English documents;

wherein D is a similarity factor, N is the most used number of keywords in English literature, the minimum value of the number is 1, and the maximum value of the number of the keywords in English literature; m(s) _i ) The method comprises the steps of defining corresponding Machine learning and Deep learning as English documents with similarity, wherein the similarity is the similarity of keywords used in English documents and preset references; opera(opera), operatta (small opera), drama (opera)

Wherein the method comprises the steps of

s' is the number of all keywords in the preset reference;

the categories are as follows: liternature Literature, structure architecture, software, electronics, etc.;

attributes: positive, negative, neutral, similarity commonality.

In literature, tragedy is negative, comady Comedy is positive, in software

In the learning of the neural network,

active keywords:

efficiency (Efficiency): algorithms for deep learning and neural networks can efficiently handle large-scale data sets and complex tasks such as image recognition and natural language processing.

Adaptive (adaptation): deep learning and neural networks can gradually adapt to new data and tasks through training and have very strong generalization capability.

Precision (Accuracy): deep learning and neural networks have exceeded human performance in many tasks and with the continued development of technology, their accuracy will continue to improve.

Automation (Automation): deep learning and neural networks can automatically learn features and extract useful information, thereby reducing the need for human intervention.

Interpretive (interpretive): in recent years, more and more research has been devoted to explaining the decision-making process of deep learning and neural networks to improve their interpretability.

Negative keywords:

data requirements (Data Requirements): deep learning and neural network algorithms typically require large amounts of data to train, which may limit their application in certain fields.

Complexity (Complexity): the algorithms of deep learning and neural networks are themselves complex and it is difficult to explain the mechanisms by which they operate internally.

Vulnerability (vulnerabilities): algorithms for deep learning and neural networks may be subject to resistance attacks, resulting in erroneous prediction results.

Misinterpretation and misinterpretation (Misunderstandings and Misinterpretations): due to the complexity of deep learning and neural network algorithms, one often misunderstands and misinterprets their decision process.

Bias (Bias): deep learning and neural network algorithms may be affected by bias in the dataset, producing erroneous decision results.

Preferably, in the above technical solution, the S2 includes:

s2-2, said attribute is classified as

In order to improve the accuracy of the result, considering that the keyword English literature classifier with the same category and attribute has errors, performing error convergence by using threshold judgment.

Preferably, in the above technical solution, the S3 includes:

the average value of the thresholds of the English literature of the keywords of all attribute characteristics in the training data; epsilon, (0 < epsilon < 1) represents the attribute correlation judgment coefficient.

The documents with the corresponding English keywords are required to be classified in the searching process, if the existing word segmentation method is adopted, the classification process is too slow, the corresponding contents can be classified by judging through the threshold value after the attribute classification, which is equivalent to the drawing process of the knowledge graph for the English documents, and the documents can be more visual and clear for reading and using personnel, so that the subsequent consulting work is convenient.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A working method for constructing a knowledge graph by using english vocabulary data, comprising:

2. The working method for constructing a knowledge graph using english vocabulary data according to claim 1, wherein S1 comprises:

s1-2, constructing a collection model R of the keyword English literature,

3. The working method for constructing a knowledge graph using english vocabulary data according to claim 1, wherein S1 further comprises:

4. the working method for constructing a knowledge graph using english vocabulary data according to claim 1, wherein S1 further comprises:

Wherein the method comprises the steps of

s' is the total number of keywords in the preset reference.

5. The working method for constructing a knowledge graph using english vocabulary data according to claim 1, wherein S2 comprises:

s2-2, said attribute is classified as

6. The working method for constructing a knowledge graph using english vocabulary data according to claim 1, wherein S3 comprises: