CN114372150A - Knowledge graph construction method, system, device and storage medium - Google Patents

Knowledge graph construction method, system, device and storage medium Download PDF

Info

Publication number
CN114372150A
CN114372150A CN202111505685.1A CN202111505685A CN114372150A CN 114372150 A CN114372150 A CN 114372150A CN 202111505685 A CN202111505685 A CN 202111505685A CN 114372150 A CN114372150 A CN 114372150A
Authority
CN
China
Prior art keywords
word
vector
determining
text data
tuple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111505685.1A
Other languages
Chinese (zh)
Other versions
CN114372150B (en
Inventor
李洁
龚晟
杨震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi IoT Technology Co Ltd
Original Assignee
Tianyi IoT Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi IoT Technology Co Ltd filed Critical Tianyi IoT Technology Co Ltd
Priority to CN202111505685.1A priority Critical patent/CN114372150B/en
Publication of CN114372150A publication Critical patent/CN114372150A/en
Application granted granted Critical
Publication of CN114372150B publication Critical patent/CN114372150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph construction method, a knowledge graph construction system, a knowledge graph construction device and a storage medium, and relates to the technical field of computers. The knowledge graph construction method comprises the following steps: acquiring text data; processing the text data to obtain a plurality of word parameters; determining a difference vector according to the text data and the plurality of word parameters; and updating a relation rule base according to the difference vector. According to the method, the common information of the extracted relation rules and the current relation rule base is removed through a principal component analysis method, and then the relation rule base is updated according to the relation rules when the difference vector similarity of the relation rules extracted twice before and after is larger than a preset value, so that newly generated relation rules can be better screened, the calculated amount is reduced, and the accuracy of knowledge graph relation extraction is improved.

Description

Knowledge graph construction method, system, device and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a knowledge graph construction method, a knowledge graph construction system, a knowledge graph construction device and a storage medium.
Background
In the construction process of the industry knowledge graph, the extraction of the relation between the entities in the text is a key and difficult problem. The traditional rule identification-based method is high in labor cost and low in recall rate, a large amount of sample data needs to be marked in the supervised learning-based method, the labor consumption is large, and the accuracy rate of the semi-supervised learning-based method is rapidly reduced along with the increase of iteration times although the labor input can be reduced.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a method, a system, a device and a storage medium for constructing a knowledge graph, which can improve the accuracy of the knowledge graph while improving the construction efficiency of the knowledge graph.
In one aspect, an embodiment of the present invention provides a method for constructing a knowledge graph, including the following steps:
acquiring text data;
processing the text data to obtain a plurality of word parameters;
determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
According to some embodiments of the invention, the word parameters comprise word vectors and tf-idf values, and the processing the text data to obtain a plurality of word parameters comprises:
performing word segmentation processing on the text data to obtain a plurality of words;
determining a word frequency of each of the words in the text data;
determining the tf-idf value of each word according to the word frequency;
determining a word vector for each of the words via a neural network coding model.
According to some embodiments of the invention, the determining a sentence vector according to the relationship rule and the word parameter comprises the steps of:
determining a plurality of the word parameters included in the relationship rule;
determining the sentence vector from a plurality of the word parameters, wherein the sentence vector is determined by the formula:
Figure BDA0003403090000000021
wherein S represents a sentence vector, n represents the number of words contained in the sentence vector, tiTf-idf value, V, representing the ith wordiA word vector representing the ith word.
According to some embodiments of the invention, the determining a disparity vector in the sentence vector from the principal component feature vector comprises:
determining a first principal component in the principal component feature vector;
and determining the difference between the sentence vector and the projection value of the sentence vector on the first principal component to obtain the difference vector.
According to some embodiments of the invention, the disparity vector is determined by the following formula:
Sd=S-uuTS;
wherein S isdRepresenting a disparity vector, S representing a sentence vector, u representing a first principal component, u representing a second principal componentTA transpose matrix representing the first principal component.
According to some embodiments of the present invention, the word parameter further includes an entity type, and the processing the text data to obtain the plurality of word parameters further includes:
inputting the word vector of the word into an entity recognition model to obtain the entity type of the word;
the step of extracting the text data according to the relation rule base to obtain second tuple data and updating the tuple database comprises the following steps of:
extracting the text data according to the relation rule base to obtain a plurality of second tuple data;
selecting second tuple data which is the same as a preset entity type according to the entity type of the word;
adding second tuple data with the same type as the preset entity into the tuple database to update the tuple database.
According to some embodiments of the invention, the method of knowledge-graph construction comprises the steps of:
and repeatedly executing the step of updating the relation rule base according to the difference vector to update the tuple database and the relation rule base until the similarity is less than the preset value, and stopping updating.
On the other hand, the embodiment of the invention also provides a knowledge graph construction system, which comprises:
a first module for acquiring text data;
the second module is used for processing the text data to obtain a plurality of word parameters;
a third module for determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
a fourth module for updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
On the other hand, the embodiment of the invention also provides a knowledge graph construction device, which comprises:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement a method of knowledge-graph construction as previously described.
In another aspect, the embodiment of the present invention further provides a computer-readable storage medium, which stores computer-executable instructions for causing a computer to execute the method for constructing a knowledge graph as described above.
The technical scheme of the invention at least has one of the following advantages or beneficial effects: and the construction of the knowledge graph is based on the acquired text data to extract the relation rule and the second tuple data to continuously update the tuple database and the relation rule base. And determining a sentence vector based on the extracted relation rule, and determining a difference vector between the sentence vector and a principal component characteristic vector of the current relation rule base by a principal component analysis method, thereby selecting the relation rule with certain difference from the current relation rule base for further analysis. And then based on two difference vectors obtained by the tuple database before and after updating, when the similarity of the two difference vectors is greater than a preset value, adding the currently extracted relationship rule into the relationship rule base for updating. According to the method, the common information of the extracted relation rules and the current relation rule base is removed through a principal component analysis method, and then the relation rule base is updated according to the relation rules when the difference vector similarity of the relation rules extracted twice before and after is larger than a preset value, so that newly generated relation rules can be better screened, the calculated amount is reduced, and the accuracy of knowledge graph relation extraction is improved.
Drawings
FIG. 1 is a flow chart of a knowledge graph construction method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a knowledge graph building system provided by an embodiment of the invention;
fig. 3 is a schematic diagram of a knowledge graph constructing apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or components having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplicity of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, if there are first, second, etc. described, they are only used for distinguishing technical features, but they are not interpreted as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating the precedence of the indicated technical features.
Referring to fig. 1, the method for constructing a knowledge graph according to an embodiment of the present invention includes, but is not limited to, step S100, step S200, step S300, and step S400.
Step S100, acquiring text data;
step S200, processing the text data to obtain a plurality of word parameters;
step S300, determining a difference vector according to the text data and the plurality of word parameters, wherein the difference vector is determined through the following steps:
step S310, a relation rule base and a tuple database are obtained;
step S320, extracting a relation rule in the text data according to the first tuple data in the tuple database;
step S330, determining sentence vectors according to the relation rules and the word parameters;
step S340, determining principal component characteristic vectors of the relation rule base through a principal component analysis method;
step S350, determining a difference vector in the sentence vector according to the principal component feature vector;
step S400, updating the relation rule base according to the difference vector, wherein the relation rule base is updated through the following steps:
step S410, extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
step S420, determining a new difference vector and a new relation rule according to the text data and the word parameters based on the new tuple database;
and step S430, updating the relation rule base according to the new difference vector and the similarity of the difference vector when the similarity is greater than a preset value.
Specifically, the text data may be obtained from a network or manually input, and then the text data is processed to obtain a plurality of word parameters, for example, after the text data is subjected to word segmentation, weight calculation, encoding, entity type identification, and the like, the word parameters of a plurality of words are obtained, and the word parameters may include tf-idf values, word vectors, entity types, and the like. Then, a tuple database initialized by a person is obtained, corresponding relation rules are extracted from text data according to initial first tuple data in the tuple database, and then the relation rules are added into a relation rule base to initialize the relation rule base. For example, the initial first tuple data is "apple belongs to fruit" and "fruit belongs to food", and the rule of the relationship between "strawberry" and "food" in the text data, that is, "strawberry belongs to food", can be identified according to the first tuple data. Then, according to the word parameters such as strawberry and food in the extracted relation rule, a sentence vector S is generated1Calculating principal component feature vector W of relational rule base by principal component analysis1Then according to the sentence vector S1And principal component feature vector W1Determining a disparity vector Sd1. And then extracting the text data according to the relation rule base to obtain second tuple data, and adding the second tuple data into the tuple database to update the tuple database. And determining a new difference vector S according to the text data and the word parameters based on the new tuple databased2And new relation rule, calculating difference vector Sd2And a disparity vector Sd1And when the similarity is greater than a preset value, adding the new relation rule into the relation rule base to update the relation rule base. Then, the text data is continuously extracted according to the relation rule base to obtain second tuple data, and the second tuple data is added into the tuple database to update the tuple dataA library. And determining a new difference vector S according to the text data and the word parameters based on the new tuple databased3And new relation rule, calculating difference vector Sd3And a disparity vector Sd2And when the similarity is greater than a preset value, adding the new relation rule into the relation rule base to update the relation rule base. And analogizing in turn, thereby continuously updating the tuple database and the relation rule base until the quantity of the relation rule base and the tuple data is not increased any more, namely when the similarity is smaller than a preset value, and completing the construction of the knowledge graph.
In this embodiment, the construction of the knowledge graph continuously updates the tuple database and the relation rule base by extracting the relation rule and the tuple data based on the acquired text data. And determining a sentence vector based on the extracted relation rule, and determining a difference vector between the sentence vector and a principal component characteristic vector of the current relation rule base by a principal component analysis method, thereby selecting the relation rule with certain difference from the current relation rule base for further analysis. And then based on two difference vectors obtained by the tuple database before and after updating, when the similarity of the two difference vectors is greater than a preset value, adding the currently extracted relationship rule into the relationship rule base for updating. According to the method, the common information of the extracted relation rules and the current relation rule base is removed through a principal component analysis method, and then the relation rule base is updated according to the relation rules when the difference vector similarity of the relation rules extracted twice before and after is larger than a preset value, so that newly generated relation rules can be better screened, the calculated amount is reduced, and the accuracy of knowledge graph relation extraction is improved.
According to some embodiments of the invention, the word parameters include a word vector and tf-idf values, and step S200 includes, but is not limited to, the following steps:
step S210, performing word segmentation processing on the text data to obtain a plurality of words;
step S220, determining the word frequency of each word in the text data;
step S230, determining the tf-idf value of each word according to the word frequency;
step S240, determining a word vector of each word through a neural network coding model.
Specifically, after word segmentation processing is carried out on text data to obtain a plurality of words, word frequency of each word in the text data is determined, then based on TF-IDF technology, TF-IDF value of each word is determined according to the word frequency, and then word vector of each word is determined through a neural network coding model.
It should be noted that TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining. TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse Document Frequency).
It should be noted that the neural network coding model may be a word2vec model or a fasttext model.
According to some embodiments of the invention, step S330 includes, but is not limited to, the following steps:
step S331, determining a plurality of word parameters contained in the relation rule;
step S332, determining a sentence vector according to the word parameters, wherein the sentence vector is determined by the following formula:
Figure BDA0003403090000000061
wherein S represents a sentence vector, n represents the number of words contained in the sentence vector, tiTf-idf value, V, representing the ith wordiA word vector representing the ith word.
According to some embodiments of the invention, step S350 includes, but is not limited to, the following steps:
step S351, determining a first principal component in the principal component feature vector;
in step S352, the difference between the sentence vector and the projection value of the sentence vector on the first principal component is determined to obtain a difference vector.
Specifically, the disparity vector is determined by the following formula:
Sd=S-uuTS;
wherein S isdRepresenting a disparity vector, S representing a sentence vector, u representing a first principal component, u representing a second principal componentTA transpose matrix representing the first principal component.
It should be noted that, in general, the principal component feature vector includes a plurality of principal components, the more the information of the common relationship rule in the relationship rule base represented by the first principal component is, the second principal component is, and so on. The embodiment of the present invention is not limited to calculating the difference between the sentence vector and the projection value of the sentence vector on the first principal component to obtain the difference vector, and may also calculate the difference between the sentence vector and the projection value of the sentence vector on all principal components or calculate the difference between the sentence vector and the projection value of the sentence vector on the first several principal components to obtain the difference vector.
According to some embodiments of the present invention, the word parameter further includes an entity type, and step S200 further includes, but is not limited to, the following steps:
step S250, inputting the word vector of the word into the entity recognition model to obtain the entity type of the word;
step S410 includes, but is not limited to, the following steps:
step S411, extracting the text data according to a relation rule base to obtain a plurality of second tuple data;
step S412, selecting second tuple data which is the same as the preset entity type according to the entity type of the word;
in step S413, adding the second tuple data with the same type as the preset entity into the tuple database to update the tuple database.
Specifically, after the text data is extracted according to the relation rule base to obtain a plurality of second tuple data, the second tuple data with the same entity type as the preset entity type is selected according to the entity type of the words in the second tuple data, for example, if the preset entity type is a place name, the second tuple data with the entity type as the place name is selected from the plurality of second tuple data, and then the second tuple data with the same entity type as the preset entity type is added into the tuple data base to update the tuple data base, so that the knowledge graph can be constructed according to the required theme, and the efficiency and the accuracy of constructing the knowledge graph are improved.
According to some embodiments of the present invention, the method for constructing a knowledge graph further includes, but is not limited to, the following steps:
step S600, the step of updating the relation rule base according to the difference vector is repeatedly executed to update the tuple database and the relation rule base, and the updating is stopped until the similarity is smaller than the preset value.
The embodiment of the present invention further provides a knowledge graph construction system, referring to fig. 2, including:
a first module for acquiring text data;
the second module is used for processing the text data to obtain a plurality of word parameters;
a third module for determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component characteristic vectors of a relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
a fourth module for updating the relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new difference vector and the similarity of the difference vector when the similarity is greater than a preset value.
It can be understood that the contents in the embodiment of the knowledge graph construction method are all applicable to the embodiment of the system, the functions specifically realized by the embodiment of the system are the same as those of the embodiment of the knowledge graph construction method, and the beneficial effects achieved by the embodiment of the knowledge graph construction method are also the same as those achieved by the embodiment of the knowledge graph construction method.
Referring to fig. 3, fig. 3 is a schematic diagram of a knowledge graph constructing apparatus according to an embodiment of the present invention. The knowledge graph constructing device of the embodiment of the invention comprises one or more control processors and memories, and one control processor and one memory are taken as an example in fig. 3.
The control processor and the memory may be connected by a bus or other means, as exemplified by the bus connection in fig. 3.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, and the remote memory may be connected to the knowledge-graph constructing apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Those skilled in the art will appreciate that the configuration of the apparatus shown in FIG. 3 does not constitute a limitation of the knowledge-graph building apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The non-transitory software programs and instructions required to implement the method of knowledge-graph construction applied to the knowledge-graph constructing apparatus in the above-described embodiments are stored in a memory and, when executed by a control processor, perform the method of knowledge-graph construction applied to the knowledge-graph constructing apparatus in the above-described embodiments.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, which stores computer-executable instructions, which are executed by one or more control processors, and can make the one or more control processors execute the method for constructing the knowledge graph in the method embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (10)

1. A knowledge graph construction method is characterized by comprising the following steps:
acquiring text data;
processing the text data to obtain a plurality of word parameters;
determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
2. The method of constructing a knowledge graph according to claim 1, wherein the word parameters include word vectors and tf-idf values, and the processing the text data to obtain a plurality of word parameters includes the steps of:
performing word segmentation processing on the text data to obtain a plurality of words;
determining a word frequency of each of the words in the text data;
determining the tf-idf value of each word according to the word frequency;
determining a word vector for each of the words via a neural network coding model.
3. The method of knowledge-graph construction according to claim 2, wherein said determining sentence vectors according to said relationship rules and said word parameters comprises the steps of:
determining a plurality of the word parameters included in the relationship rule;
determining the sentence vector from a plurality of the word parameters, wherein the sentence vector is determined by the formula:
Figure FDA0003403089990000011
wherein S represents a sentence vector, n represents the number of words contained in the sentence vector, tiTf-idf value, V, representing the ith wordiA word vector representing the ith word.
4. The method of constructing a knowledge graph according to claim 3, wherein the determining a difference vector in the sentence vector according to the principal component feature vector comprises the steps of:
determining a first principal component in the principal component feature vector;
and determining the difference between the sentence vector and the projection value of the sentence vector on the first principal component to obtain the difference vector.
5. The method of knowledge-graph construction according to claim 4, wherein the disparity vector is determined by the following formula:
Sd=S-uuTS;
wherein S isdRepresenting a disparity vector, S representing a sentence vector, u representing a first principal component, u representing a second principal componentTA transpose matrix representing the first principal component.
6. The method of constructing a knowledge graph according to claim 2, wherein the word parameters further include entity types, and the processing the text data to obtain a plurality of word parameters further includes the steps of:
inputting the word vector of the word into an entity recognition model to obtain the entity type of the word;
the step of extracting the text data according to the relation rule base to obtain second tuple data and updating the tuple database comprises the following steps of:
extracting the text data according to the relation rule base to obtain a plurality of second tuple data;
selecting second tuple data which is the same as a preset entity type according to the entity type of the word;
adding second tuple data with the same type as the preset entity into the tuple database to update the tuple database.
7. The method of knowledge-graph construction according to claim 1, comprising the steps of:
and repeatedly executing the step of updating the relation rule base according to the difference vector to update the tuple database and the relation rule base until the similarity is less than the preset value, and stopping updating.
8. A knowledge-graph building system, comprising:
a first module for acquiring text data;
the second module is used for processing the text data to obtain a plurality of word parameters;
a third module for determining a difference vector from the text data and the plurality of word parameters, wherein the difference vector is determined by:
acquiring a relation rule base and a tuple database;
extracting a relation rule in the text data according to first tuple data in the tuple database;
determining sentence vectors according to the relation rules and the word parameters;
determining principal component feature vectors of the relation rule base through a principal component analysis method;
determining a difference vector in the sentence vector according to the principal component feature vector;
a fourth module for updating a relational rule base according to the difference vector, wherein the relational rule base is updated by:
extracting the text data according to the relation rule base to obtain second tuple data, and updating the tuple database;
determining a new difference vector and a new relation rule according to the text data and the plurality of word parameters based on the new tuple database;
and updating the relation rule base according to the new relation rule when the similarity is greater than a preset value according to the new difference vector and the similarity of the difference vector.
9. A knowledge-graph building apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of knowledge-graph construction according to any one of claims 1 to 7.
10. A computer-readable storage medium in which a processor-executable program is stored, wherein the processor-executable program, when executed by the processor, is for implementing the method of knowledge-graph construction according to any one of claims 1 to 7.
CN202111505685.1A 2021-12-10 2021-12-10 Knowledge graph construction method, system, device and storage medium Active CN114372150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111505685.1A CN114372150B (en) 2021-12-10 2021-12-10 Knowledge graph construction method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111505685.1A CN114372150B (en) 2021-12-10 2021-12-10 Knowledge graph construction method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN114372150A true CN114372150A (en) 2022-04-19
CN114372150B CN114372150B (en) 2024-05-07

Family

ID=81139764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111505685.1A Active CN114372150B (en) 2021-12-10 2021-12-10 Knowledge graph construction method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN114372150B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489751A (en) * 2019-08-13 2019-11-22 腾讯科技(深圳)有限公司 Text similarity computing method and device, storage medium, electronic equipment
CN111309925A (en) * 2020-02-10 2020-06-19 同方知网(北京)技术有限公司 Knowledge graph construction method of military equipment
WO2021139229A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Text rhetorical sentence generation method, apparatus and device, and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489751A (en) * 2019-08-13 2019-11-22 腾讯科技(深圳)有限公司 Text similarity computing method and device, storage medium, electronic equipment
CN111309925A (en) * 2020-02-10 2020-06-19 同方知网(北京)技术有限公司 Knowledge graph construction method of military equipment
WO2021139229A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Text rhetorical sentence generation method, apparatus and device, and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韦韬;王金华;: "基于非分类关系提取技术的知识图谱构建", 工业技术创新, no. 02, 30 April 2020 (2020-04-30), pages 23 - 28 *

Also Published As

Publication number Publication date
CN114372150B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
CN112101190B (en) Remote sensing image classification method, storage medium and computing device
US20210382937A1 (en) Image processing method and apparatus, and storage medium
US11232141B2 (en) Method and device for processing an electronic document
CN108073902B (en) Video summarizing method and device based on deep learning and terminal equipment
CN111783713B (en) Weak supervision time sequence behavior positioning method and device based on relation prototype network
CN111950728B (en) Image feature extraction model construction method, image retrieval method and storage medium
CN110135505B (en) Image classification method and device, computer equipment and computer readable storage medium
CN109918498B (en) Problem warehousing method and device
CN111767796A (en) Video association method, device, server and readable storage medium
CN111368887B (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN114283350B (en) Visual model training and video processing method, device, equipment and storage medium
CN110188422B (en) Method and device for extracting feature vector of node based on network data
CN110825894A (en) Data index establishing method, data index retrieving method, data index establishing device, data index retrieving device, data index establishing equipment and storage medium
CN113011529B (en) Training method, training device, training equipment and training equipment for text classification model and readable storage medium
CN116795947A (en) Document recommendation method, device, electronic equipment and computer readable storage medium
CN114329711A (en) Prefabricated part data processing method and system based on graph computation platform
CN113821657A (en) Artificial intelligence-based image processing model training method and image processing method
CN114372150B (en) Knowledge graph construction method, system, device and storage medium
CN116578700A (en) Log classification method, log classification device, equipment and medium
CN110909551B (en) Language pre-training model updating method and device, electronic equipment and storage medium
CN114239842A (en) Information processing apparatus, information processing system, and information processing method
CN113901175A (en) Article relation judging method and device
CN113705589A (en) Data processing method, device and equipment
CN111708908A (en) Video tag adding method and device, electronic equipment and computer-readable storage medium
CN113935387A (en) Text similarity determination method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant