CN113553439A

CN113553439A - Method and system for knowledge graph mining

Info

Publication number: CN113553439A
Application number: CN202110678441.7A
Authority: CN
Inventors: 高鹏; 郝少春; 袁兰; 吴飞; 周伟华; 高峰; 潘晶
Original assignee: Hangzhou Mjoys Big Data Technology Co ltd
Current assignee: Hangzhou Mjoys Big Data Technology Co ltd
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-10-26

Abstract

The application relates to a method and a system for mining a knowledge graph, wherein the method for mining the knowledge graph comprises the following steps: acquiring a text and carrying out error correction processing on the text; performing word segmentation and part-of-speech tagging on the text subjected to error correction processing according to a preset word list to obtain words and parts-of-speech of the words in the text; identifying entities in the text according to the words and the parts of speech, and extracting attributes and relations of the entities in the text according to the words, the parts of speech and the entities; according to the method and the device, entity linking is carried out according to the entities, and knowledge fusion is carried out according to the entity linking results and the attributes and the relations of the entities to obtain the knowledge map.

Description

Method and system for knowledge graph mining

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a system for knowledge graph mining.

Background

Knowledge map (Knowledge Graph) is called Knowledge domain visualization or Knowledge domain mapping map in book intelligence world, is a series of different graphs for displaying Knowledge development process and structure relationship, describes Knowledge resources and carriers thereof by using visualization technology, excavates, analyzes, constructs, draws and displays Knowledge and mutual relation among Knowledge resources and carriers thereof, and is widely applied to a plurality of fields of question answering, searching, recommendation and the like.

In the related technology, the process of knowledge graph mining needs manual participation, the knowledge graph mining is to mine information offline to obtain new knowledge, and then the new knowledge is updated to the storage content of the knowledge graph in a timing mode, so that the updating of the knowledge graph has large hysteresis.

Aiming at the problem that the knowledge updating of the knowledge graph has large hysteresis in the related technology, an effective solution is not provided.

Disclosure of Invention

The embodiment of the application provides a method and a system for knowledge graph mining, which at least solve the problem that the knowledge updating of the knowledge graph in the related technology has larger hysteresis.

In a first aspect, an embodiment of the present application provides a method for knowledge graph mining, where the method includes:

acquiring a text and carrying out error correction processing on the text;

performing word segmentation and part-of-speech tagging on the text subjected to error correction processing according to a preset word list to obtain words in the text and parts-of-speech of the words;

identifying entities in the text according to the words and the parts of speech, and extracting the attributes and the relations of the entities in the text according to the words, the parts of speech and the entities;

and carrying out entity linking according to the entities, and carrying out knowledge fusion according to the entity linking result, the attributes of the entities and the relationship to obtain a knowledge graph.

In some embodiments, the vocabulary building process includes:

adopting a plurality of part-of-speech tagging tools, and configuring the parts of speech in the plurality of part-of-speech tagging tools into parts of speech in a target part-of-speech tagging set;

acquiring basic data for constructing a word list, dividing sentences of the basic data, and inputting the divided basic data into a plurality of part-of-speech tagging tools to obtain tagging results, wherein the tagging results comprise words of the basic data and parts of speech of the words;

and under the condition that the labeling results obtained by at least two labeling tools are the same, recording the labeling results, counting the occurrence frequency of the labeling results, and generating the word list according to the labeling results and the frequency.

In some of these embodiments, the identification process of the entity includes:

respectively carrying out entity recognition through a dictionary and a recognition model;

under the condition that the recognition result of the dictionary is the same as the recognition result of the recognition model, adopting the entity words in the recognition result;

and when the recognition result of the dictionary is empty and the confidence of the recognition result of the recognition model reaches a confidence threshold, storing the entity words in the recognition result of the recognition model and the associated information of the entity words, wherein the associated information comprises the dialogue sentences in which the entity words are located.

In some embodiments, the extracting of the attributes and relationships of the entities includes:

inputting words in the text, the part of speech of the words and the entities into a syntactic analysis model to obtain an analysis result, wherein the analysis result comprises a main-predicate-object relationship, and at least one of a subject and an object is an entity in the main-predicate-object relationship;

and constructing the attribute of the entity by using a Label Property Graph as a basic data structure under the condition that one of the subject and the object of the subject-predicate relationship is the entity, and constructing the relationship between the entities under the condition that both the subject and the object of the subject-predicate relationship are the entities.

In some of these embodiments, the entity link comprises a candidate entity recall, the process of candidate entity recall comprising:

inputting the entity to a word vector model, determining the similar meaning words of the entity to obtain a first set, inputting the entity to a BERT model, determining the similar meaning words of the entity to obtain a second set, wherein the word vector model is obtained by training according to the word list;

merging the first set and the second set to obtain a near meaning word set of the entity;

determining words in the knowledge graph existing in the similar meaning word set to obtain a recalled entity list.

In some embodiments, the text error correction process includes error checking and error correction, and the error checking process includes:

inputting the text into a classification model, and determining that an error sentence exists;

and recalling similar characters for each Chinese character in the sentence with errors, wherein the similar characters comprise similar characters or similar sound.

In a second aspect, an embodiment of the present application provides a system for knowledge graph mining, where the system includes:

the acquisition module is used for acquiring a text and correcting the text;

the word segmentation module is used for performing word segmentation and part-of-speech tagging on the text after error correction processing according to a preset word list to obtain words and parts-of-speech of the words in the text;

the extraction module is used for identifying entities in the text according to the words and the parts of speech and extracting the attributes and the relations of the entities in the text according to the words, the parts of speech and the entities;

and the fusion module is used for carrying out entity link according to the entity and carrying out knowledge fusion according to the result of the entity link, the attribute of the entity and the relationship to obtain a knowledge graph.

In some embodiments, the word segmentation module is further configured to:

acquiring basic data for constructing a word list, segmenting the basic data, inputting the segmented basic data into a plurality of part-of-speech tagging tools, and obtaining tagging results, wherein the parts of speech in the plurality of part-of-speech tagging tools are all configured as the parts of speech in a target part-of-speech tagging set, and the tagging results comprise words of the basic data and the parts of speech of the words;

In a third aspect, the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of knowledge-graph mining when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for knowledge-graph mining.

Compared with the related technology, the method for mining the knowledge graph provided by the embodiment of the application acquires the text and performs error correction processing on the text; performing word segmentation and part-of-speech tagging on the text subjected to error correction processing according to a preset word list to obtain words and parts-of-speech of the words in the text; identifying entities in the text according to the words and the parts of speech, and extracting attributes and relations of the entities in the text according to the words, the parts of speech and the entities; and carrying out entity linking according to the entities, and carrying out knowledge fusion according to the entity linking results and the attributes and the relations of the entities to obtain the knowledge map, so that the problem of large hysteresis in knowledge updating of the knowledge map in the related technology is solved, and the effect of updating the knowledge of the knowledge map in time is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram of an application environment of a method of knowledge-graph mining according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of knowledge-graph mining according to a first embodiment of the present application;

FIG. 3 is a flow chart of a process of text correction according to a second embodiment of the present application;

FIG. 4 is a flow chart of a vocabulary construction process according to a third embodiment of the present application;

FIG. 5 is a flow chart of an entity identification process according to a fourth embodiment of the present application;

FIG. 6 is a flow chart of an extraction process of attributes and relationships of entities according to a fifth embodiment of the present application;

FIG. 7 is a flowchart of an entity linking process according to a sixth embodiment of the present application;

FIG. 8 is a flow chart of a method of knowledge-graph mining according to a seventh embodiment of the present application;

FIG. 9 is a block diagram of a system for knowledge-graph mining according to an eighth embodiment of the present application;

fig. 10 is an internal structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The method for mining the knowledge graph provided by the application can be applied to an application environment shown in fig. 1, fig. 1 is an application environment schematic diagram of the method for mining the knowledge graph according to the embodiment of the application, as shown in fig. 1, a server 102 acquires text data of a terminal 101 through a network and operates the method for mining the knowledge graph, so that knowledge can be mined from the text data, the server 102 adds or updates the knowledge to the knowledge graph, the server 102 can be implemented by an independent server or a server cluster consisting of a plurality of servers, and the terminal 101 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices.

The present embodiment provides a method for knowledge graph mining, and fig. 2 is a flowchart of a method for knowledge graph mining according to a first embodiment of the present application, as shown in fig. 2, the flowchart includes the following steps:

step S201, acquiring a text, and performing error correction processing on the text, for example, acquiring a newly added text in the interval time period in a short time interval, and performing error correction processing on the text, for example, the time interval may be every 30 seconds, where the text may be a dialog content in a human-computer dialog scene;

step S202, performing word segmentation and part-of-speech tagging on the text subjected to the error correction processing according to a preset word list to obtain words and parts-of-speech of the words in the text, wherein optionally, a method of combining Ngram and HMM can be used for word segmentation and part-of-speech tagging;

step S203, identifying entities in the text according to the words and the parts of speech, and extracting attributes and relations of the entities in the text according to the words, the parts of speech and the entities, wherein the types of the entities mainly comprise names of people, names of institutions, names of places, dates, time, quantity and the like, and for example, the entities can be 'morals';

and step S204, carrying out entity linking according to the entities, and carrying out knowledge fusion according to the result of the entity linking and the attributes and the relations of the entities to obtain a knowledge map.

Through steps S201 to S204, compared to the problem of large hysteresis in knowledge update of a knowledge graph in the related art, in the embodiment, by acquiring a text, performing error correction, word segmentation and part-of-speech tagging on the text content, identifying an entity in the text, extracting attributes and relationships of the entity in the text, performing entity linking and knowledge fusion, obtaining the knowledge graph, analyzing the text of a newly generated human-computer conversation to obtain knowledge in the text, and updating new knowledge generated in a conversation process into the knowledge graph in time, the problem of large hysteresis in knowledge update of the knowledge graph in the related art is solved, and an effect of updating knowledge of the knowledge graph in time is achieved.

In addition, the embodiment does not need to manually perform operations such as mining and auditing, reduces the labor input and reduces the updating cost of the knowledge graph.

In some embodiments, fig. 3 is a flowchart of a text error correction process according to a second embodiment of the present application, and as shown in fig. 3, the process includes the following steps:

step S301, inputting a text into a classification model, and determining that an error sentence exists, wherein the classification model can be a BI-LSTM model;

step S302, similar character recall is carried out on each Chinese character in the sentence with errors, wherein the similar characters comprise similar characters or similar sound;

step S303, inputting a text and similar characters to a BERT model, determining the character with the maximum probability to obtain a candidate character, for example, inputting the text and the similar characters to a Tiny BERT model, and determining the most smooth character in the text to obtain the candidate character;

step S304, determining whether the Chinese character is the same as the candidate character of the Chinese character, if not, replacing the Chinese character with the candidate character, if so, not replacing.

Through steps S301 to S304, compared with the problem that text error correction is performed word by word in the related art, and the error correction efficiency is low, in the embodiment, text error correction is divided into two steps, error check is performed first, and after a sentence with an error is checked, the sentence is corrected again, so that the efficiency of text error correction is greatly improved, thereby saving time for the knowledge mining process and accelerating the updating speed of the knowledge map.

Considering that the process of updating the knowledge graph in the embodiment of the present application does not involve human review, and in order to avoid adding wrong content to the knowledge graph, it is necessary to improve the accuracy of the mined knowledge as much as possible, in some embodiments, fig. 4 is a flowchart of a process of constructing a vocabulary according to a third embodiment of the present application, and as shown in fig. 4, the process includes the following steps:

step S401, a plurality of parts of speech tagging tools are adopted, and the parts of speech in the plurality of parts of speech tagging tools are all configured to be parts of speech in a target part of speech tagging set, optionally, the plurality of parts of speech tagging tools can be segmentation and part of speech tagging tools such as Chinese word segmentation, Baidu LAC, big North PKU and big Harbin LTP, and the target part of speech tagging set can be 863 parts of speech tagging set;

step S402, obtaining basic data for constructing a word list, carrying out clause on the basic data, inputting the claused basic data into a plurality of part-of-speech tagging tools to obtain tagging results, wherein the tagging results comprise words and parts of speech of the basic data, and optionally, the basic data for constructing the word list can be a data set disclosed by a network and a self-owned dialogue corpus;

and S403, recording the labeling results under the condition that the labeling results obtained by at least two labeling tools are the same, counting the occurrence frequency of the labeling results, and generating a word list according to the labeling results and the frequency, wherein the frequency can be used as a reference basis for subsequent word segmentation and part-of-speech labeling.

Through steps S401 to S403, for the problem that the accuracy of the word segmentation and part-of-speech tagging results is not high because only one word segmentation and part-of-speech tagging tool is used for word segmentation and part-of-speech tagging in the related art, in this embodiment, parts-of-speech in a plurality of part-of-speech tagging tools are all configured as parts-of-speech in the target part-of-speech tagging set, so that the unification of part-of-speech standards of different part-of-speech tagging tools is realized, and the tagging results are recorded only when the tagging results obtained by at least two tagging tools are the same, so that the accuracy of the word segmentation and part-of-speech tagging results is improved, and the accuracy of the mined knowledge is improved.

In some embodiments, fig. 5 is a flowchart of an entity identification process according to a fourth embodiment of the present application, and as shown in fig. 5, the flowchart includes the following steps:

step S501, entity recognition is carried out through a dictionary and a recognition model respectively, wherein the recognition model can be a model built by using an architecture combining BI-LSTM and CRF, the BI-LSTM can effectively utilize context information of text features, and the CRF can learn the context of a label;

step S502, under the condition that the recognition result of the dictionary is the same as the recognition result of the recognition model, the entity words in the recognition result are adopted;

in step S503, when the recognition result of the dictionary is empty and the confidence of the recognition result of the recognition model reaches the confidence threshold, storing the entity words in the recognition result of the recognition model and the associated information of the entity words, where the associated information includes information such as the dialog sentences in which the entity words are located, the frequency of occurrence, and optionally, 0.8 may be set as the confidence threshold.

Through the steps S501 to S503, compared with the problem that the accuracy of the recognition result is not high when the dictionary is used alone for entity recognition or the model is used alone for recognition in the related art, in the embodiment of the present application, the two recognition methods are combined, and the entity words in the recognition result are used only when the recognition results of the two recognition methods are the same, so that the accuracy of the recognition result is improved, and the accuracy of the mined knowledge is improved.

Meanwhile, the embodiment of the application also stores the entity words of which the recognition results of the dictionary are empty and the confidence coefficient of the recognition results of the recognition model reaches the confidence coefficient threshold, so that the normal operation of the current knowledge mining process is not influenced, the stored data can be uniformly processed manually and periodically in the follow-up process, and specifically, the information such as the frequency and the confidence coefficient of the entity words can be comprehensively considered to determine whether to add the entity words into the dictionary, so that the vocabulary in the dictionary is continuously enriched, and the follow-up entity recognition of the dictionary is facilitated.

In some embodiments, fig. 6 is a flowchart of an extraction process of attributes and relationships of entities according to a fifth embodiment of the present application, and as shown in fig. 6, the process includes the following steps:

step S601, inputting words, parts of speech, and entities of the words in a text into a syntactic analysis model to obtain an analysis result, because knowledge in the embodiment of the present application is automatically updated into a knowledge graph, so that syntactic relations in related technologies are labeled in advance, and a method of obtaining standard data is not applicable to the present application, in the embodiment of the present application, a unsupervised method is adopted, starting with data, a predicate-object relation in the analysis result is extracted, and in the predicate-object relation, at least one of a subject and an object is an entity, specifically, the analysis result may include a predicate relation SBV (for example, "baby birth"), a middle relation ATT (for example, "university teaching"), a predicate relation VOB (for example, "three days and three nights"), and a core relation HED (for example, "this is the cheapest taxi for engendering);

step S602, a Label Property Graph is used as a basic data structure, the Property of the entity is constructed under the condition that one of the subject and the object of the subject-predicate relationship is the entity, the relationship between the entities is constructed under the condition that both the subject and the object of the subject-predicate relationship are the entities, and the Label Property Graph is used as the basic data structure, so that the existing entity Property and relationship Property can be continuously improved while new entities and relationships are continuously added.

Through the steps S601 to S602, the analysis result is limited to the subject-predicate relationship, and the subject-predicate relationship can clearly reflect the entity and the attribute and relationship of the entity, so that the reliability of the knowledge determined according to the subject-predicate relationship is high, and the accuracy of the mined knowledge is improved; in addition, because the accuracy of the knowledge mined by the embodiment is high, the embodiment of the application is also suitable for constructing the knowledge graph of the field for a certain blank field of the knowledge graph, and the knowledge graph of the field can be conveniently and quickly mined by acquiring the related data or the historical dialogue records of the field and operating the knowledge graph mining method.

In some of these embodiments, the entity link includes candidate entity recalls and indications matching, fig. 7 is a flow chart of an entity link process according to a sixth embodiment of the present application, as shown in fig. 7, the flow includes the following steps:

step S701, inputting an entity to a word vector model, determining a near meaning word of the entity to obtain a first set, inputting the entity to a BERT model, determining the near meaning word of the entity to obtain a second set, wherein the word vector model is obtained by training according to a word list;

step S702, merging the first set and the second set to obtain a near meaning word set of the entity;

step S703, determining words existing in the synonym set in the knowledge graph to obtain a recalled entity list, where synonyms of entities in the entity list may be referred to as candidate entities;

step S704, replacing the entity in the original sentence with each candidate entity to obtain a new sentence containing the candidate entity, feeding the original sentence and the new sentence into a twin network simultaneously, determining whether the original sentence and the new sentence are similar, if so, indicating that the entity in the original sentence can be linked to the entity in the knowledge graph, wherein the Loss function of the network adopts a contrast Loss function:

L(W,Y,X₁,X₂)＝(1-Y)*(1/2)*(D_W)²+Y*(1/2)*{max(0,m-D_w)}²

y ═ 0 indicates that the original sentence and the new sentence are similar or matched, Y ═ 1 indicates that the original sentence and the new sentence are not similar or matched, and optionally, the confidence threshold m may be set higher, for example, to 0.9;

step S705, according to the result of entity link, performing knowledge fusion to obtain a knowledge graph, specifically, under the condition that an entity cannot be linked to the existing entity of the knowledge graph, adding the entity to the knowledge graph, meanwhile, if an entity is extracted, adding the attribute of the entity, and if a pair of entities is extracted, adding the relationship between the entities; under the condition that an entity can be linked to an existing entity of the knowledge graph, if an entity is extracted, whether the entity has attributes or not is determined, if yes, the attributes of the entity are updated, the modification time is recorded, if not, the attributes of the entity are added, if a pair of entities are extracted, whether the relationship between the entities exists or not is determined, if yes, the relationship between the entities is updated, the modification time is recorded, and if not, the relationship between the entities is added.

Through steps S701 to S705, the present application avoids repeated content of the knowledge graph through similar query, candidate entity recall, Mention matching, and knowledge fusion, so that the content of the knowledge graph is neat and systematic enough, and in addition, compared with the case of recalling the candidate entity in the related art, only the BERT model is adopted alone to obtain the synonyms of the candidate entity, which may cause the situation of missing part of the synonyms.

In some of these embodiments, fig. 8 is a flow chart of a method of knowledge-graph mining according to a seventh embodiment of the present application, as shown in fig. 8, the method comprising: obtaining a dialog text of a user, and preprocessing the text, wherein the preprocessing process comprises text error correction, word segmentation and part of speech tagging; performing knowledge extraction on the preprocessed text, wherein the knowledge extraction process comprises named entity identification, attribute extraction and relationship extraction; according to the extracted entities, entity linking is carried out, and the process of entity linking comprises similar word query, candidate entity recall and indication matching; performing knowledge fusion according to the result of entity link, wherein the process of knowledge fusion comprises entity fusion, attribute fusion and relationship fusion, so as to obtain an updated knowledge graph; and continuously acquiring the newly generated dialog text of the user, executing the knowledge graph mining process, and continuously obtaining the updated knowledge graph, so that the robot continuously learns by self in the actual dialog process to update the knowledge graph, and the content of the knowledge graph can be continuously improved along with the increase of the dialog.

The present embodiment further provides a system for knowledge graph mining, and fig. 9 is a block diagram of a structure of the system for knowledge graph mining according to an eighth embodiment of the present application, as shown in fig. 9, the system includes:

the acquiring module 91 is configured to acquire a text and perform error correction processing on the text;

the word segmentation module 92 is configured to perform word segmentation and part-of-speech tagging on the text after the error correction processing according to a preset word list, so as to obtain words and parts-of-speech of the words in the text;

the extraction module 93 is configured to identify entities in the text according to the words and the parts of speech, and extract attributes and relationships of the entities in the text according to the words, the parts of speech, and the entities;

and the fusion module 94 is configured to perform entity linking according to the entities, and perform knowledge fusion according to the result of the entity linking and the attributes and relationships of the entities to obtain a knowledge graph.

In some embodiments, the segmentation module 92 is further configured to:

acquiring basic data for constructing a word list, segmenting the basic data, inputting the segmented basic data into a plurality of part-of-speech tagging tools, and obtaining tagging results, wherein the parts of speech in the plurality of part-of-speech tagging tools are all configured into parts of speech in a target part-of-speech tagging set, and the tagging results comprise words of the basic data and the parts of speech of the words;

and under the condition that the labeling results obtained by at least two labeling tools are the same, recording the labeling results, counting the occurrence frequency of the labeling results, and generating a word list according to the labeling results and the frequency.

In one embodiment, fig. 10 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 10, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 10. The electronic device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing data. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of knowledge-graph mining.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the present solution and does not constitute a limitation on the electronic devices to which the present solution applies, and that a particular electronic device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of knowledge-graph mining, the method comprising:

acquiring a text and carrying out error correction processing on the text;

2. The method of claim 1, wherein the vocabulary building process comprises:

3. The method of claim 1, wherein the identification process of the entity comprises:

4. The method of claim 1, wherein the extracting of the attributes and relationships of the entities comprises:

5. The method of claim 2, wherein the entity link comprises a candidate entity recall, and wherein the candidate entity recall comprises:

6. The method of claim 1, wherein the text error correction process comprises error checking and error correction, and wherein the error checking process comprises:

7. A system for knowledge-graph mining, the system comprising:

the acquisition module is used for acquiring a text and correcting the text;

8. The system of claim 7, wherein the word segmentation module is further configured to:

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements a method of knowledge-graph mining as claimed in any one of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of knowledge-graph mining according to any one of claims 1 to 6.