CN114372150A - Knowledge graph construction method, system, device and storage medium - Google Patents
Knowledge graph construction method, system, device and storage medium Download PDFInfo
- Publication number
- CN114372150A CN114372150A CN202111505685.1A CN202111505685A CN114372150A CN 114372150 A CN114372150 A CN 114372150A CN 202111505685 A CN202111505685 A CN 202111505685A CN 114372150 A CN114372150 A CN 114372150A
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- determining
- text data
- tuple
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 161
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000012847 principal component analysis method Methods 0.000 claims abstract description 11
- 230000015654 memory Effects 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 4
- 235000013305 food Nutrition 0.000 description 4
- 241000220223 Fragaria Species 0.000 description 3
- 235000016623 Fragaria vesca Nutrition 0.000 description 3
- 235000011363 Fragaria x ananassa Nutrition 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a knowledge graph construction method, a knowledge graph construction system, a knowledge graph construction device and a storage medium, and relates to the technical field of computers. The knowledge graph construction method comprises the following steps: acquiring text data; processing the text data to obtain a plurality of word parameters; determining a difference vector according to the text data and the plurality of word parameters; and updating a relation rule base according to the difference vector. According to the method, the common information of the extracted relation rules and the current relation rule base is removed through a principal component analysis method, and then the relation rule base is updated according to the relation rules when the difference vector similarity of the relation rules extracted twice before and after is larger than a preset value, so that newly generated relation rules can be better screened, the calculated amount is reduced, and the accuracy of knowledge graph relation extraction is improved.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a knowledge graph construction method, a knowledge graph construction system, a knowledge graph construction device and a storage medium.
Background
In the construction process of the industry knowledge graph, the extraction of the relation between the entities in the text is a key and difficult problem. The traditional rule identification-based method is high in labor cost and low in recall rate, a large amount of sample data needs to be marked in the supervised learning-based method, the labor consumption is large, and the accuracy rate of the semi-supervised learning-based method is rapidly reduced along with the increase of iteration times although the labor input can be reduced.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a method, a system, a device and a storage medium for constructing a knowledge graph, which can improve the accuracy of the knowledge graph while improving the construction efficiency of the knowledge graph.
In one aspect, an embodiment of the present invention provides a method for constructing a knowledge graph, including the following steps:
acquiring text data;
processing the text data to obtain a plurality of word parameters;
determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
According to some embodiments of the invention, the word parameters comprise word vectors and tf-idf values, and the processing the text data to obtain a plurality of word parameters comprises:
performing word segmentation processing on the text data to obtain a plurality of words;
determining a word frequency of each of the words in the text data;
determining the tf-idf value of each word according to the word frequency;
determining a word vector for each of the words via a neural network coding model.
According to some embodiments of the invention, the determining a sentence vector according to the relationship rule and the word parameter comprises the steps of:
determining a plurality of the word parameters included in the relationship rule;
determining the sentence vector from a plurality of the word parameters, wherein the sentence vector is determined by the formula:
wherein S represents a sentence vector, n represents the number of words contained in the sentence vector, tiTf-idf value, V, representing the ith wordiA word vector representing the ith word.
According to some embodiments of the invention, the determining a disparity vector in the sentence vector from the principal component feature vector comprises:
determining a first principal component in the principal component feature vector;
and determining the difference between the sentence vector and the projection value of the sentence vector on the first principal component to obtain the difference vector.
According to some embodiments of the invention, the disparity vector is determined by the following formula:
Sd=S-uuTS;
wherein S isdRepresenting a disparity vector, S representing a sentence vector, u representing a first principal component, u representing a second principal componentTA transpose matrix representing the first principal component.
According to some embodiments of the present invention, the word parameter further includes an entity type, and the processing the text data to obtain the plurality of word parameters further includes:
inputting the word vector of the word into an entity recognition model to obtain the entity type of the word;
the step of extracting the text data according to the relation rule base to obtain second tuple data and updating the tuple database comprises the following steps of:
extracting the text data according to the relation rule base to obtain a plurality of second tuple data;
selecting second tuple data which is the same as a preset entity type according to the entity type of the word;
adding second tuple data with the same type as the preset entity into the tuple database to update the tuple database.
According to some embodiments of the invention, the method of knowledge-graph construction comprises the steps of:
and repeatedly executing the step of updating the relation rule base according to the difference vector to update the tuple database and the relation rule base until the similarity is less than the preset value, and stopping updating.
On the other hand, the embodiment of the invention also provides a knowledge graph construction system, which comprises:
a first module for acquiring text data;
the second module is used for processing the text data to obtain a plurality of word parameters;
a third module for determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
a fourth module for updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
On the other hand, the embodiment of the invention also provides a knowledge graph construction device, which comprises:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement a method of knowledge-graph construction as previously described.
In another aspect, the embodiment of the present invention further provides a computer-readable storage medium, which stores computer-executable instructions for causing a computer to execute the method for constructing a knowledge graph as described above.
The technical scheme of the invention at least has one of the following advantages or beneficial effects: and the construction of the knowledge graph is based on the acquired text data to extract the relation rule and the second tuple data to continuously update the tuple database and the relation rule base. And determining a sentence vector based on the extracted relation rule, and determining a difference vector between the sentence vector and a principal component characteristic vector of the current relation rule base by a principal component analysis method, thereby selecting the relation rule with certain difference from the current relation rule base for further analysis. And then based on two difference vectors obtained by the tuple database before and after updating, when the similarity of the two difference vectors is greater than a preset value, adding the currently extracted relationship rule into the relationship rule base for updating. According to the method, the common information of the extracted relation rules and the current relation rule base is removed through a principal component analysis method, and then the relation rule base is updated according to the relation rules when the difference vector similarity of the relation rules extracted twice before and after is larger than a preset value, so that newly generated relation rules can be better screened, the calculated amount is reduced, and the accuracy of knowledge graph relation extraction is improved.
Drawings
FIG. 1 is a flow chart of a knowledge graph construction method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a knowledge graph building system provided by an embodiment of the invention;
fig. 3 is a schematic diagram of a knowledge graph constructing apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or components having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplicity of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, if there are first, second, etc. described, they are only used for distinguishing technical features, but they are not interpreted as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating the precedence of the indicated technical features.
Referring to fig. 1, the method for constructing a knowledge graph according to an embodiment of the present invention includes, but is not limited to, step S100, step S200, step S300, and step S400.
Step S100, acquiring text data;
step S200, processing the text data to obtain a plurality of word parameters;
step S300, determining a difference vector according to the text data and the plurality of word parameters, wherein the difference vector is determined through the following steps:
step S310, a relation rule base and a tuple database are obtained;
step S320, extracting a relation rule in the text data according to the first tuple data in the tuple database;
step S330, determining sentence vectors according to the relation rules and the word parameters;
step S340, determining principal component characteristic vectors of the relation rule base through a principal component analysis method;
step S350, determining a difference vector in the sentence vector according to the principal component feature vector;
step S400, updating the relation rule base according to the difference vector, wherein the relation rule base is updated through the following steps:
step S410, extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
step S420, determining a new difference vector and a new relation rule according to the text data and the word parameters based on the new tuple database;
and step S430, updating the relation rule base according to the new difference vector and the similarity of the difference vector when the similarity is greater than a preset value.
Specifically, the text data may be obtained from a network or manually input, and then the text data is processed to obtain a plurality of word parameters, for example, after the text data is subjected to word segmentation, weight calculation, encoding, entity type identification, and the like, the word parameters of a plurality of words are obtained, and the word parameters may include tf-idf values, word vectors, entity types, and the like. Then, a tuple database initialized by a person is obtained, corresponding relation rules are extracted from text data according to initial first tuple data in the tuple database, and then the relation rules are added into a relation rule base to initialize the relation rule base. For example, the initial first tuple data is "apple belongs to fruit" and "fruit belongs to food", and the rule of the relationship between "strawberry" and "food" in the text data, that is, "strawberry belongs to food", can be identified according to the first tuple data. Then, according to the word parameters such as strawberry and food in the extracted relation rule, a sentence vector S is generated1Calculating principal component feature vector W of relational rule base by principal component analysis1Then according to the sentence vector S1And principal component feature vector W1Determining a disparity vector Sd1. And then extracting the text data according to the relation rule base to obtain second tuple data, and adding the second tuple data into the tuple database to update the tuple database. And determining a new difference vector S according to the text data and the word parameters based on the new tuple databased2And new relation rule, calculating difference vector Sd2And a disparity vector Sd1And when the similarity is greater than a preset value, adding the new relation rule into the relation rule base to update the relation rule base. Then, the text data is continuously extracted according to the relation rule base to obtain second tuple data, and the second tuple data is added into the tuple database to update the tuple dataA library. And determining a new difference vector S according to the text data and the word parameters based on the new tuple databased3And new relation rule, calculating difference vector Sd3And a disparity vector Sd2And when the similarity is greater than a preset value, adding the new relation rule into the relation rule base to update the relation rule base. And analogizing in turn, thereby continuously updating the tuple database and the relation rule base until the quantity of the relation rule base and the tuple data is not increased any more, namely when the similarity is smaller than a preset value, and completing the construction of the knowledge graph.
In this embodiment, the construction of the knowledge graph continuously updates the tuple database and the relation rule base by extracting the relation rule and the tuple data based on the acquired text data. And determining a sentence vector based on the extracted relation rule, and determining a difference vector between the sentence vector and a principal component characteristic vector of the current relation rule base by a principal component analysis method, thereby selecting the relation rule with certain difference from the current relation rule base for further analysis. And then based on two difference vectors obtained by the tuple database before and after updating, when the similarity of the two difference vectors is greater than a preset value, adding the currently extracted relationship rule into the relationship rule base for updating. According to the method, the common information of the extracted relation rules and the current relation rule base is removed through a principal component analysis method, and then the relation rule base is updated according to the relation rules when the difference vector similarity of the relation rules extracted twice before and after is larger than a preset value, so that newly generated relation rules can be better screened, the calculated amount is reduced, and the accuracy of knowledge graph relation extraction is improved.
According to some embodiments of the invention, the word parameters include a word vector and tf-idf values, and step S200 includes, but is not limited to, the following steps:
step S210, performing word segmentation processing on the text data to obtain a plurality of words;
step S220, determining the word frequency of each word in the text data;
step S230, determining the tf-idf value of each word according to the word frequency;
step S240, determining a word vector of each word through a neural network coding model.
Specifically, after word segmentation processing is carried out on text data to obtain a plurality of words, word frequency of each word in the text data is determined, then based on TF-IDF technology, TF-IDF value of each word is determined according to the word frequency, and then word vector of each word is determined through a neural network coding model.
It should be noted that TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency).
It should be noted that the neural network coding model may be a word2vec model or a fasttext model.
According to some embodiments of the invention, step S330 includes, but is not limited to, the following steps:
step S331, determining a plurality of word parameters contained in the relation rule;
step S332, determining a sentence vector according to the word parameters, wherein the sentence vector is determined by the following formula:
wherein S represents a sentence vector, n represents the number of words contained in the sentence vector, tiTf-idf value, V, representing the ith wordiA word vector representing the ith word.
According to some embodiments of the invention, step S350 includes, but is not limited to, the following steps:
step S351, determining a first principal component in the principal component feature vector;
in step S352, the difference between the sentence vector and the projection value of the sentence vector on the first principal component is determined to obtain a difference vector.
Specifically, the disparity vector is determined by the following formula:
Sd=S-uuTS;
wherein S isdRepresenting a disparity vector, S representing a sentence vector, u representing a first principal component, u representing a second principal componentTA transpose matrix representing the first principal component.
It should be noted that, in general, the principal component feature vector includes a plurality of principal components, the more the information of the common relationship rule in the relationship rule base represented by the first principal component is, the second principal component is, and so on. The embodiment of the present invention is not limited to calculating the difference between the sentence vector and the projection value of the sentence vector on the first principal component to obtain the difference vector, and may also calculate the difference between the sentence vector and the projection value of the sentence vector on all principal components or calculate the difference between the sentence vector and the projection value of the sentence vector on the first several principal components to obtain the difference vector.
According to some embodiments of the present invention, the word parameter further includes an entity type, and step S200 further includes, but is not limited to, the following steps:
step S250, inputting the word vector of the word into the entity recognition model to obtain the entity type of the word;
step S410 includes, but is not limited to, the following steps:
step S411, extracting the text data according to a relation rule base to obtain a plurality of second tuple data;
step S412, selecting second tuple data which is the same as the preset entity type according to the entity type of the word;
in step S413, adding the second tuple data with the same type as the preset entity into the tuple database to update the tuple database.
Specifically, after the text data is extracted according to the relation rule base to obtain a plurality of second tuple data, the second tuple data with the same entity type as the preset entity type is selected according to the entity type of the words in the second tuple data, for example, if the preset entity type is a place name, the second tuple data with the entity type as the place name is selected from the plurality of second tuple data, and then the second tuple data with the same entity type as the preset entity type is added into the tuple data base to update the tuple data base, so that the knowledge graph can be constructed according to the required theme, and the efficiency and the accuracy of constructing the knowledge graph are improved.
According to some embodiments of the present invention, the method for constructing a knowledge graph further includes, but is not limited to, the following steps:
step S600, the step of updating the relation rule base according to the difference vector is repeatedly executed to update the tuple database and the relation rule base, and the updating is stopped until the similarity is smaller than the preset value.
The embodiment of the present invention further provides a knowledge graph construction system, referring to fig. 2, including:
a first module for acquiring text data;
the second module is used for processing the text data to obtain a plurality of word parameters;
a third module for determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component characteristic vectors of a relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
a fourth module for updating the relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new difference vector and the similarity of the difference vector when the similarity is greater than a preset value.
It can be understood that the contents in the embodiment of the knowledge graph construction method are all applicable to the embodiment of the system, the functions specifically realized by the embodiment of the system are the same as those of the embodiment of the knowledge graph construction method, and the beneficial effects achieved by the embodiment of the knowledge graph construction method are also the same as those achieved by the embodiment of the knowledge graph construction method.
Referring to fig. 3, fig. 3 is a schematic diagram of a knowledge graph constructing apparatus according to an embodiment of the present invention. The knowledge graph constructing device of the embodiment of the invention comprises one or more control processors and memories, and one control processor and one memory are taken as an example in fig. 3.
The control processor and the memory may be connected by a bus or other means, as exemplified by the bus connection in fig. 3.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, and the remote memory may be connected to the knowledge-graph constructing apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Those skilled in the art will appreciate that the configuration of the apparatus shown in FIG. 3 does not constitute a limitation of the knowledge-graph building apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The non-transitory software programs and instructions required to implement the method of knowledge-graph construction applied to the knowledge-graph constructing apparatus in the above-described embodiments are stored in a memory and, when executed by a control processor, perform the method of knowledge-graph construction applied to the knowledge-graph constructing apparatus in the above-described embodiments.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, which stores computer-executable instructions, which are executed by one or more control processors, and can make the one or more control processors execute the method for constructing the knowledge graph in the method embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.
Claims (10)
1. A knowledge graph construction method is characterized by comprising the following steps:
acquiring text data;
processing the text data to obtain a plurality of word parameters;
determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
2. The method of constructing a knowledge graph according to claim 1, wherein the word parameters include word vectors and tf-idf values, and the processing the text data to obtain a plurality of word parameters includes the steps of:
performing word segmentation processing on the text data to obtain a plurality of words;
determining a word frequency of each of the words in the text data;
determining the tf-idf value of each word according to the word frequency;
determining a word vector for each of the words via a neural network coding model.
3. The method of knowledge-graph construction according to claim 2, wherein said determining sentence vectors according to said relationship rules and said word parameters comprises the steps of:
determining a plurality of the word parameters included in the relationship rule;
determining the sentence vector from a plurality of the word parameters, wherein the sentence vector is determined by the formula:
wherein S represents a sentence vector, n represents the number of words contained in the sentence vector, tiTf-idf value, V, representing the ith wordiA word vector representing the ith word.
4. The method of constructing a knowledge graph according to claim 3, wherein the determining a difference vector in the sentence vector according to the principal component feature vector comprises the steps of:
determining a first principal component in the principal component feature vector;
and determining the difference between the sentence vector and the projection value of the sentence vector on the first principal component to obtain the difference vector.
5. The method of knowledge-graph construction according to claim 4, wherein the disparity vector is determined by the following formula:
Sd=S-uuTS;
wherein S isdRepresenting a disparity vector, S representing a sentence vector, u representing a first principal component, u representing a second principal componentTA transpose matrix representing the first principal component.
6. The method of constructing a knowledge graph according to claim 2, wherein the word parameters further include entity types, and the processing the text data to obtain a plurality of word parameters further includes the steps of:
inputting the word vector of the word into an entity recognition model to obtain the entity type of the word;
the step of extracting the text data according to the relation rule base to obtain second tuple data and updating the tuple database comprises the following steps of:
extracting the text data according to the relation rule base to obtain a plurality of second tuple data;
selecting second tuple data which is the same as a preset entity type according to the entity type of the word;
adding second tuple data with the same type as the preset entity into the tuple database to update the tuple database.
7. The method of knowledge-graph construction according to claim 1, comprising the steps of:
and repeatedly executing the step of updating the relation rule base according to the difference vector to update the tuple database and the relation rule base until the similarity is less than the preset value, and stopping updating.
8. A knowledge-graph building system, comprising:
a first module for acquiring text data;
the second module is used for processing the text data to obtain a plurality of word parameters;
a third module for determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
a fourth module for updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
9. A knowledge-graph building apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of knowledge-graph construction according to any one of claims 1 to 7.
10. A computer-readable storage medium in which a processor-executable program is stored, wherein the processor-executable program, when executed by the processor, is for implementing the method of knowledge-graph construction according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111505685.1A CN114372150B (en) | 2021-12-10 | 2021-12-10 | Knowledge graph construction method, system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111505685.1A CN114372150B (en) | 2021-12-10 | 2021-12-10 | Knowledge graph construction method, system, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114372150A true CN114372150A (en) | 2022-04-19 |
CN114372150B CN114372150B (en) | 2024-05-07 |
Family
ID=81139764
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111505685.1A Active CN114372150B (en) | 2021-12-10 | 2021-12-10 | Knowledge graph construction method, system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114372150B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110489751A (en) * | 2019-08-13 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Text similarity computing method and device, storage medium, electronic equipment |
CN111309925A (en) * | 2020-02-10 | 2020-06-19 | 同方知网(北京)技术有限公司 | Knowledge graph construction method of military equipment |
WO2021139229A1 (en) * | 2020-07-31 | 2021-07-15 | 平安科技(深圳)有限公司 | Text rhetorical sentence generation method, apparatus and device, and readable storage medium |
-
2021
- 2021-12-10 CN CN202111505685.1A patent/CN114372150B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110489751A (en) * | 2019-08-13 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Text similarity computing method and device, storage medium, electronic equipment |
CN111309925A (en) * | 2020-02-10 | 2020-06-19 | 同方知网(北京)技术有限公司 | Knowledge graph construction method of military equipment |
WO2021139229A1 (en) * | 2020-07-31 | 2021-07-15 | 平安科技(深圳)有限公司 | Text rhetorical sentence generation method, apparatus and device, and readable storage medium |
Non-Patent Citations (1)
Title |
---|
韦韬;王金华;: "基于非分类关系提取技术的知识图谱构建", 工业技术创新, no. 02, 30 April 2020 (2020-04-30), pages 23 - 28 * |
Also Published As
Publication number | Publication date |
---|---|
CN114372150B (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112101190B (en) | Remote sensing image classification method, storage medium and computing device | |
US20210382937A1 (en) | Image processing method and apparatus, and storage medium | |
US11232141B2 (en) | Method and device for processing an electronic document | |
CN108073902B (en) | Video summarizing method and device based on deep learning and terminal equipment | |
CN111783713B (en) | Weak supervision time sequence behavior positioning method and device based on relation prototype network | |
CN111950728B (en) | Image feature extraction model construction method, image retrieval method and storage medium | |
CN110135505B (en) | Image classification method and device, computer equipment and computer readable storage medium | |
CN109918498B (en) | Problem warehousing method and device | |
CN111767796A (en) | Video association method, device, server and readable storage medium | |
CN111368887B (en) | Training method of thunderstorm weather prediction model and thunderstorm weather prediction method | |
CN114283350B (en) | Visual model training and video processing method, device, equipment and storage medium | |
CN110188422B (en) | Method and device for extracting feature vector of node based on network data | |
CN110825894A (en) | Data index establishing method, data index retrieving method, data index establishing device, data index retrieving device, data index establishing equipment and storage medium | |
CN113011529B (en) | Training method, training device, training equipment and training equipment for text classification model and readable storage medium | |
CN116795947A (en) | Document recommendation method, device, electronic equipment and computer readable storage medium | |
CN114329711A (en) | Prefabricated part data processing method and system based on graph computation platform | |
CN113821657A (en) | Artificial intelligence-based image processing model training method and image processing method | |
CN114372150B (en) | Knowledge graph construction method, system, device and storage medium | |
CN116578700A (en) | Log classification method, log classification device, equipment and medium | |
CN110909551B (en) | Language pre-training model updating method and device, electronic equipment and storage medium | |
CN114239842A (en) | Information processing apparatus, information processing system, and information processing method | |
CN113901175A (en) | Article relation judging method and device | |
CN113705589A (en) | Data processing method, device and equipment | |
CN111708908A (en) | Video tag adding method and device, electronic equipment and computer-readable storage medium | |
CN113935387A (en) | Text similarity determination method and device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |