CN112749284B - Knowledge graph construction method, device, equipment and storage medium - Google Patents

Knowledge graph construction method, device, equipment and storage medium Download PDF

Info

Publication number
CN112749284B
CN112749284B CN202011635788.5A CN202011635788A CN112749284B CN 112749284 B CN112749284 B CN 112749284B CN 202011635788 A CN202011635788 A CN 202011635788A CN 112749284 B CN112749284 B CN 112749284B
Authority
CN
China
Prior art keywords
key
service
knowledge
text
nouns
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011635788.5A
Other languages
Chinese (zh)
Other versions
CN112749284A (en
Inventor
杜振中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011635788.5A priority Critical patent/CN112749284B/en
Publication of CN112749284A publication Critical patent/CN112749284A/en
Application granted granted Critical
Publication of CN112749284B publication Critical patent/CN112749284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention relates to the field of data analysis, and discloses a method, a device, equipment and a storage medium for constructing a knowledge graph, which are used for automatically constructing the knowledge graph aiming at the specific service field. The knowledge graph construction method comprises the following steps: calling an information crawler tool to acquire service texts of different types of services from a service database; inputting service texts of various types of services into a proper name recognition model to obtain a key name word set; acquiring entity relationships among key nouns in the key noun set in an information database according to the key noun set; calling a text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result; and removing key nouns irrelevant to the selected service type from the key noun set, and deleting the corresponding entity relationship to obtain the knowledge graph of the selected service type. In addition, the invention also relates to a block chain technology, and the related information of the knowledge graph can be stored in the block chain.

Description

Knowledge graph construction method, device, equipment and storage medium
Technical Field
The invention relates to the field of data analysis, in particular to a knowledge graph construction method, a knowledge graph construction device, knowledge graph construction equipment and a storage medium.
Background
The knowledge map is called knowledge domain visualization or knowledge domain mapping map in the book information world, is a series of different graphs for displaying the relationship between the knowledge development process and the structure, is a structured knowledge organization structure formed by the relationship, the entity and the attribute, and can be widely applied to numerous fields of semantic search, intelligent question answering, personalized recommendation and the like after being realized.
In the prior art, the coverage of the knowledge field of the mainstream knowledge map is large, a large amount of knowledge irrelevant to the required field exists, and the efficiency and the accuracy are low when the knowledge map is used for practical application; the manual construction of the knowledge graph consumes a large amount of manpower and material resources, and is difficult to update and expand along with the change of knowledge; there is no method that can automatically perform the construction of the knowledge graph for the currently desired domain.
Disclosure of Invention
The invention mainly aims to solve the problem that the conventional knowledge graph construction method cannot automatically construct a knowledge graph aiming at the required field.
The invention provides a knowledge graph construction method in a first aspect, which comprises the following steps:
calling an information crawler tool to acquire service texts of different types of services from a service database;
inputting the service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a key noun set;
acquiring entity relationships among key nouns in the key noun set in a preset information database according to the key noun set;
calling a pre-established text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result;
and according to the analysis result, removing key nouns irrelevant to the selected service type from the key noun set, and deleting the corresponding entity relation to obtain the knowledge graph of the selected service type.
Optionally, in a first implementation manner of the first aspect of the present invention, the invoking an information crawler tool to obtain service texts of different types of services from a service database includes:
sending a source code acquisition request to a target website in the service database, and reading a source code of the target website after the source code acquisition request passes;
downloading page data in the target website according to the source code of the target website;
and identifying the content in the page data to obtain service texts of different types of services.
Optionally, in a second implementation manner of the first aspect of the present invention, before the inputting the service text of each type of service into a pre-established proper name recognition model, and extracting a key noun to obtain a key noun set, the method further includes:
collecting text corpus information;
labeling words in the text corpus information, and performing sentence segmentation and recombination on the labeled text corpus information to obtain a text corpus training set;
and calling the text corpus training set to train the deep learning model to obtain a proper name recognition model.
Optionally, in a third implementation manner of the first aspect of the present invention, before the invoking a pre-established text classification model, analyzing the correlation between the key nouns and the corresponding entity relationships and the corresponding services, and obtaining an analysis result, the method further includes:
collecting the service corpora of the selected service type, and labeling service classification labels on the service corpora to obtain a corpus classification training set;
and acquiring a Bert pre-training model, taking the corpus classification training set as a newly added input vector of the Bert pre-training model, and performing fine-tuning training on the Bert pre-training model to obtain a trained text classification model.
Optionally, in a fourth implementation manner of the first aspect of the present invention, before the obtaining the Bert pre-training model, taking the corpus classification training set as a new input vector of the Bert pre-training model, and performing fine-tuning training on the Bert pre-training model to obtain a trained text classification model, the method further includes:
calling the text corpus training set to pre-train a two-channel Transformer model to obtain initial parameters;
and storing the initial parameters to obtain a Bert pre-training model.
Optionally, in a fifth implementation manner of the first aspect of the present invention, before the obtaining, according to the key term set, an entity relationship between key terms in the key term set in a preset information database, the method further includes:
acquiring at least one knowledge resource website, wherein the knowledge resource website comprises Baidu encyclopedia, Chinese knowledge network and MBA Chinesia;
calling an information crawler tool to crawl the knowledge resource website of the at least one knowledge resource website to obtain data information in the knowledge resource website of the at least one knowledge resource website;
and constructing an information database according to the data information.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after the removing, according to the analysis result, the key nouns unrelated to the corresponding services from the key noun set, and deleting the corresponding entity relationships to obtain a service knowledge graph, the method further includes:
based on the received knowledge map updating request, calling an information crawler tool to acquire newly added service texts of different types of services from a service database;
inputting the newly added service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a newly added key noun set;
acquiring entity relationships among all newly added key nouns in the newly added key noun set in a preset information database according to the newly added key noun set;
calling a pre-established text classification model, and analyzing the correlation between the newly-added key nouns and the corresponding entity relationship and the selected service type to obtain an analysis result;
and according to the analysis result, removing the newly added key nouns irrelevant to the selected service type from the newly added key noun set, deleting the corresponding entity relationship, and updating the knowledge graph of the selected service type.
The second aspect of the present invention provides a knowledge graph constructing apparatus, including:
the text acquisition module is used for calling the information crawler tool to acquire service texts of different types of services from the service database;
a noun extraction module, which is used for inputting the service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a key noun set;
the entity relationship acquisition module is used for acquiring the entity relationship among key nouns in the key noun set in a preset information database according to the key noun set;
the correlation analysis module is used for calling a pre-established text classification model, analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types and obtaining an analysis result;
and the knowledge graph establishing module is used for removing key nouns irrelevant to the selected service type from the key noun set according to the analysis result and deleting the corresponding entity relation to obtain the knowledge graph of the selected service type.
Optionally, in a first implementation manner of the second aspect of the present invention, the text obtaining module includes:
a source code acquisition unit, configured to send a source code acquisition request to a target website in the service database, where the source code acquisition request reads a source code of the target website after passing through the source code acquisition request;
the data downloading unit is used for downloading the page data in the target website according to the source code of the target website;
and the service text identification unit is used for identifying the content in the page data to obtain service texts of different types of services.
Optionally, in a second implementation manner of the second aspect of the present invention, the knowledge graph constructing apparatus further includes a proper name recognition training module, where the proper name recognition training module specifically includes:
the text corpus collecting unit is used for collecting text corpus information;
the text corpus training set construction unit is used for labeling words in the text corpus information, and performing sentence division and recombination on the labeled text corpus information to obtain a text corpus training set;
and the training unit is used for calling the text corpus training set to train the deep learning model to obtain a proper name recognition model.
Optionally, in a third implementation manner of the second aspect of the present invention, the knowledge-graph constructing apparatus further includes an information database constructing module, where the information database constructing module includes:
the system comprises a knowledge resource website acquisition unit, a resource management unit and a resource management unit, wherein the knowledge resource website acquisition unit is used for acquiring at least one knowledge resource website, and the knowledge resource website comprises Baidu encyclopedia, Chinese knowledge network and MBA intelligence library;
the data information crawling unit is used for calling an information crawler tool to crawl the at least one knowledge resource website to obtain data information in the at least one knowledge resource website;
and the information database construction unit is used for constructing an information database according to the data information.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the knowledge graph constructing apparatus further includes a Bert fine-tuning training module, where the Bert fine-tuning training module specifically includes:
a service corpus collecting unit, configured to collect service corpora of the selected service type, label service classification labels to the service corpora, and obtain a corpus classification training set;
and the fine tuning training unit is used for obtaining a Bert pre-training model, using the corpus classification training set as a newly added input vector of the Bert pre-training model, and performing fine tuning training on the Bert pre-training model to obtain a trained text classification model.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the knowledge graph constructing apparatus further includes a Bert pre-training module, where the Bert pre-training module is specifically configured to:
calling the text corpus training set to pre-train a two-channel Transformer model to obtain initial parameters; and storing the initial parameters to obtain a Bert pre-training model.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the knowledge-graph constructing apparatus further includes a knowledge-graph updating module, where the knowledge-graph updating module is specifically configured to:
based on the received knowledge map updating request, calling an information crawler tool to acquire newly added service texts of different types of services from a service database; inputting the newly added service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a newly added key noun set; acquiring entity relationships among all newly added key nouns in the newly added key noun set in a preset information database according to the newly added key noun set; calling a pre-established text classification model, and analyzing the correlation between the newly-added key nouns and the corresponding entity relationship and the selected service type to obtain an analysis result; and according to the analysis result, removing the newly added key nouns irrelevant to the selected service type from the newly added key noun set, deleting the corresponding entity relationship, and updating the knowledge graph of the selected service type.
A third aspect of the present invention provides a knowledge-graph constructing apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the knowledge-graph building apparatus to perform the steps of the knowledge-graph building method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the above-described method of knowledge-graph construction.
In the technical scheme provided by the invention, an information crawler tool is called to obtain service texts of different types of services from a service database; inputting service texts of various types of services into a proper name recognition model to obtain a key name word set; acquiring entity relationships among key nouns in the key noun set in an information database according to the key noun set; calling a text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result; and removing key nouns irrelevant to the selected service type from the key noun set, and deleting the corresponding entity relationship to obtain the knowledge graph of the selected service type.
In the embodiment of the invention, the service knowledge of the required field can be automatically collected, the knowledge graph is automatically constructed aiming at the required field, the knowledge correlation degree of the knowledge graph in the required service field is improved, and the construction operation of the knowledge graph is simplified.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a knowledge graph construction method in an embodiment of the invention;
FIG. 2 is a schematic diagram of another embodiment of a knowledge graph construction method in the embodiment of the invention;
FIG. 3 is a schematic diagram of another embodiment of the knowledge-graph construction method in the embodiment of the invention;
FIG. 4 is a schematic diagram of another embodiment of the knowledge-graph construction method in the embodiment of the invention;
FIG. 5 is a schematic diagram of an embodiment of a knowledge graph constructing apparatus according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of another embodiment of the knowledge-graph constructing apparatus in the embodiment of the present invention;
FIG. 7 is a diagram of an embodiment of a knowledge graph building apparatus in an embodiment of the invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for constructing a knowledge graph, which can automatically collect business knowledge in a required field, automatically construct the knowledge graph aiming at the required field, improve the knowledge correlation degree of the knowledge graph in the required business field and simplify the operation of constructing the knowledge graph.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For understanding, a specific flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a method for constructing a knowledge graph according to an embodiment of the present invention includes:
101. calling an information crawler tool to acquire service texts of different types of services from a service database;
it is to be understood that the executing subject of the present invention may be a knowledge graph constructing apparatus, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
In order to complete the automatic construction of the knowledge graph, firstly, a service database is obtained, wherein the service database comprises service texts of different types of services. Specifically, the service database may be composed by collecting relevant websites, for example, the insurance-related websites include insurance industry related forums, news websites, or insurance industry related subarea blocks in portal websites, and after collecting these websites, these websites are categorized and recorded with their relevant URLs (uniform resource locators), wherein the URL is a representation method for specifying information locations on the web service program of the internet; these web sites and associated URLs are organized into a business database. And automatic search can be performed by using keywords and the like, and a service database can be obtained after search and arrangement.
And after the business database is obtained, calling an information crawler tool to crawl the webpage content in the business database. The information Crawler tool generally refers to a Web Crawler (Web Crawler), also called a Web spider or a Web robot, and is a program or script for automatically capturing Web information according to a certain rule.
And calling an information crawler tool to crawl text data on each page of the website according to the URL address of each website in the service database, wherein the text data on each website page comprises data such as structured texts in tables contained in the webpage and unstructured texts in webpage characters.
The data on the website pages comprise service texts with different service types, in the proposal, the service obtained according to the requirement is only the service type of the map, and the website in the service database can be adjusted according to the requirement, for example, when the service text in the insurance field needs to be obtained, the webpage content of forums or news related to insurance can be added in the service database. And calling an information crawler tool to crawl the texts to obtain service texts, and storing the crawled service texts.
102. Inputting service texts of various types of services into a pre-established special name recognition model, and extracting key nouns to obtain a key noun set;
in the previous step, after an information crawler tool is called to obtain a certain amount of various types of service texts, a pre-established special name recognition model is called to extract nouns of the various types of service texts. Herein, the term "named Entity Recognition" (NER), also called "named Entity Recognition", refers to an operation of recognizing entities having specific meanings in the text, including names of people, places, organizations, proper nouns, and the like.
Inputting the crawled service text into a pre-established proper name recognition model, wherein the proper name recognition model consists of two parts of neural network LSTM models, the first neural network LSTM model carries out word segmentation on the crawled text content, and the second neural network LSTM model carries out labeling on the text content after word segmentation. Specifically, when the named-name recognition model is used for labeling text contents, a BIO labeling method is adopted, wherein the BIO labeling method is used for labeling the beginning characters of words with entity significance with B, labeling the middle parts of the words with entity significance with I, and labeling non-entity significance with O, so that complete words with entity significance are obtained. In the step, after the business text is labeled by the pre-trained proper name recognition model by using a BIO labeling method, words with entity meanings in the labeled business text are extracted to obtain key nouns, and the obtained key nouns form a key noun word set.
103. Acquiring entity relationships among key nouns in a key noun set in a preset information database according to the key noun set;
in the step, an information database is preset, wherein in order to ensure the accuracy of the knowledge graph, when the information database is established, the data sources of the information database are screened, and the data information in the screened data sources is called to establish the information database.
And after the information database is obtained, retrieving and screening the data in the information database according to the key noun set obtained in the previous step to obtain the entity relationship among the key nouns in the key noun set extracted in the previous step, and after the entity relationship among the key nouns is obtained, temporarily storing the obtained data.
In the present proposal, the entity refers to each key noun in the key noun set obtained in the previous step, and the relationship refers to the way in which the data objects are connected with each other, which is also called as relationship; entities have a one-to-one relationship with each other, and possibly a one-to-many relationship. In addition, the entity relationship model also includes an Attribute (Attribute), which refers to a certain characteristic of an entity, and an entity can be characterized by several attributes.
104. Calling a pre-established text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result;
after the entity relationship among the key nouns in the key noun set is obtained, the obtained key nouns and the entity relationship are input into a pre-established text classification model for processing, wherein the text classification model can be established based on a deep learning algorithm. And calling the text classification model to analyze the relevance of the obtained key nouns and entity relations with corresponding services.
For example, when a business knowledge graph in the insurance knowledge field needs to be established, after the related insurance keywords and the entity relationship therebetween are obtained in the foregoing steps, a text classification model is invoked to analyze the key nouns and the entity relationship therebetween, specifically analyze whether the key nouns are related to the insurance field, and obtain an analysis result. The analysis result includes a judgment of whether the two are related and a related degree index.
105. And according to the analysis result, removing key nouns irrelevant to the selected service type from the key noun set, and deleting the corresponding entity relation to obtain a knowledge graph of the selected service type.
In the step of obtaining the key nouns, only noun extraction is performed on the obtained service texts to obtain the key nouns, but since some key nouns have multiple meanings, the entity relationship may include a plurality of entity relationship contents irrelevant to the service for establishing the map. Therefore, according to the analysis result obtained in the above step, the key nouns irrelevant to the selected service type are removed from the key noun set, and the corresponding entity relationship is deleted.
In addition, according to the correlation degree index in the analysis result, the corresponding key nouns and entity relations are sorted according to the priority. And constructing a knowledge graph of the selected service type according to the relation degree index of the key noun set after the irrelevant key nouns are removed and the remaining relevant entity relations.
The embodiment of the invention can collect the service knowledge of the required field, automatically construct the knowledge graph aiming at the required field, improve the knowledge correlation degree of the knowledge graph in the required service field and simplify the construction operation of the knowledge graph.
Referring to fig. 2, another embodiment of the method for constructing a knowledge graph according to an embodiment of the present invention includes:
201. sending a source code acquisition request to a target website in a service database, and reading a source code of the target website after the source code acquisition request passes;
in order to complete the automatic construction of the knowledge graph, firstly, a service database is obtained, wherein the service database comprises service texts of different types of services. Specifically, the service database may be composed by collecting relevant websites, for example, the insurance-related websites include insurance industry related forums, news websites, or insurance industry related subarea blocks in portal websites, and after collecting these websites, these websites are categorized and recorded with their relevant URLs (uniform resource locators), wherein the URL is a representation method for specifying information locations on the web service program of the internet; these web sites and associated URLs are organized into a business database. And automatic search can be performed by using keywords and the like, and a service database can be obtained after search and arrangement.
After a service database is obtained, sending a source code obtaining request to a target website in the service database, wherein the source code obtaining request is sent based on an HTTP request, the HTTP request comprises a request head, and a request for obtaining the source code is initiated to the target website according to information preset in the request head. And after the source code acquisition request passes, reading the source code of the target website.
202. Downloading page data in the target website according to the source code of the target website; identifying the content in the page data to obtain service texts of different types of services;
and after the source code of the target website is read, calling a downloading function to download and store the page data in the target website. After the page data is stored locally, identifying the page data, ignoring irrelevant data in the page, only reserving a structured text, a semi-structured text and an unstructured text in the page data, and enabling the structured text, the semi-structured text and the unstructured text to form a service text.
Specifically, the structured text and the semi-structured text are mainly derived from tables and the like in the page data, and the unstructured text is mainly derived from text contents in the page data.
203. Collecting text corpus information; labeling words in the text corpus information, and performing sentence segmentation and recombination on the labeled text corpus information to obtain a text corpus training set;
firstly, a large amount of text corpus information is collected, words or phrases in the text corpus information are classified and labeled, and entity words with specific meanings are BIO labeled to obtain labeled corpus information.
And after the labeled corpus information is obtained, carrying out sentence division and recombination on the labeled corpus information according to words or phrases to obtain the processed labeled corpus information. By the method, more processing labeled corpus information can be generated when the labeled corpus information is small in quantity, so that the corpus information quantity is increased by multiple times, and the obtained labeled corpus information and the processing labeled corpus information are stored to form a text corpus training set.
In addition, when the text corpus information is collected, the number of the text corpus information in the related field can be increased according to the field related to the knowledge map which is established as required, so that the accuracy in subsequent recognition is improved.
204. Calling a text corpus training set to train the deep learning model to obtain a proper name recognition model;
calling the obtained text corpus training set, inputting the text corpus training set into a deep learning model, training the deep learning model to obtain a proper Name Recognition model, wherein proper Name Recognition (NER), also called named Entity Recognition, refers to an operation of recognizing entities with specific meanings in the text, including names of people, places, mechanism names, proper nouns and the like.
The deep learning model can be specifically established by adopting a Long-Short Term Memory artificial neural network (LSTM). Specifically, the text corpus training set is divided into a training set, a test set and a verification set, whether the obtained recognition result is within a preset recognition error range or not is judged after training, and if the recognition result is within the preset recognition error range or not, the training is completed to obtain a special name recognition model.
205. Calling a pre-established proper name recognition model, and extracting key nouns from various types of service texts to obtain a key noun set;
after a certain amount of service texts of various types are obtained, noun extraction is carried out on the service texts of various types by utilizing a pre-established special name recognition model.
Inputting the crawled service text into a pre-established proper name recognition model, wherein the proper name recognition model consists of two parts of neural network LSTM models, the first neural network LSTM model carries out word segmentation on the crawled text content, and the second neural network LSTM model carries out labeling on the text content after word segmentation. Specifically, when the named-name recognition model is used for labeling text contents, a BIO labeling method is adopted, wherein the BIO labeling method is used for labeling the beginning characters of words with entity significance with B, labeling the middle parts of the words with entity significance with I, and labeling non-entity significance with O, so that complete words with entity significance are obtained. In this step, after the business text is labeled by using the BIO labeling method, the pre-trained proper name recognition model extracts the words with entity significance in the labeled business text, and the obtained words with entity significance form a key name word set.
206. Acquiring entity relationships among key nouns in a key noun set in a preset information database according to the key noun set;
in the step, an information database is preset, wherein in order to ensure the accuracy of the knowledge graph, when the information database is established, the data sources of the information database are screened, and the data information in the screened data sources is called to establish the information database.
And after the information database is obtained, retrieving and screening the data in the information database according to the key noun set obtained in the previous step to obtain the entity relationship among the key nouns in the key noun set extracted in the previous step, and after the entity relationship among the key nouns is obtained, temporarily storing the obtained data.
In the present proposal, the entity refers to each key noun in the key noun set obtained in the previous step, and the relationship refers to the way in which the data objects are connected with each other, which is also called as relationship; entities have a one-to-one relationship with each other, and possibly a one-to-many relationship. In addition, the entity relationship model also includes an Attribute (Attribute), which refers to a certain characteristic of an entity, and an entity can be characterized by several attributes.
207. Calling a pre-established text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the corresponding services to obtain an analysis result;
after the entity relationship among the key nouns in the key noun set is obtained, the obtained key nouns and the entity relationship are input into a pre-established text classification model for processing, wherein the text classification model can be established based on a deep learning algorithm. And calling the text classification model to analyze the relevance of the obtained key nouns and entity relations with corresponding services.
For example, when a business knowledge graph in the insurance knowledge field needs to be established, after the related insurance keywords and the entity relationship therebetween are obtained in the foregoing steps, a text classification model is invoked to analyze the key nouns and the entity relationship therebetween, specifically analyze whether the key nouns are related to the insurance field, and obtain an analysis result. The analysis result includes a judgment of whether the two are related and a related degree index.
208. According to the analysis result, removing key nouns irrelevant to the selected service type from the key noun set, and deleting the corresponding entity relation to obtain a knowledge graph of the selected service type;
in the step of obtaining the key nouns, only noun extraction is performed on the obtained service texts to obtain the key nouns, but since some key nouns have multiple meanings, the entity relationship may include a plurality of entity relationship contents irrelevant to the service for establishing the map. Therefore, according to the analysis result obtained in the above step, the key nouns irrelevant to the selected service type are removed from the key noun set, and the corresponding entity relationship is deleted.
In addition, according to the correlation degree index in the analysis result, the corresponding key nouns and entity relations are sorted according to the priority. And constructing a knowledge graph of the selected service type according to the relation degree index of the key noun set after the irrelevant key nouns are removed and the remaining relevant entity relations.
The embodiment of the invention can collect the service knowledge of the required field, automatically construct the knowledge graph aiming at the required field, improve the knowledge correlation degree of the knowledge graph in the required service field and simplify the construction operation of the knowledge graph.
Referring to fig. 3, another embodiment of the method for constructing a knowledge graph according to an embodiment of the present invention includes:
301. sending a source code acquisition request to a target website in a service database, and reading a source code of the target website after the source code acquisition request passes;
in order to complete the automatic construction of the knowledge graph, firstly, a service database is obtained, wherein the service database comprises service texts of different types of services. Specifically, the service database may be formed by collecting related websites, or may be obtained by performing automatic search using keywords or the like, and performing search and sorting.
After a service database is obtained, sending a source code obtaining request to a target website in the service database, wherein the source code obtaining request is sent based on an HTTP request, the HTTP request comprises a request head, and a request for obtaining the source code is initiated to the target website according to information preset in the request head. And after the source code acquisition request passes, reading the source code of the target website.
302. Downloading page data in the target website according to the source code of the target website; identifying the content in the page data to obtain service texts of different types of services;
and after the source code of the target website is read, calling a downloading function to download and store the page data in the target website. After the page data is stored locally, identifying the page data, ignoring irrelevant data in the page, only reserving a structured text, a semi-structured text and an unstructured text in the page data, and enabling the structured text, the semi-structured text and the unstructured text to form a service text.
Specifically, the structured text and the semi-structured text are mainly derived from tables and the like in the page data, and the unstructured text is mainly derived from text contents in the page data.
303. Calling a pre-established proper name recognition model, and extracting key nouns from various types of service texts to obtain a key noun set;
after a certain amount of various types of service texts are obtained, a pre-established special name recognition model is called to extract nouns of the various types of service texts.
Inputting the crawled service text into a pre-established proper name recognition model, wherein the proper name recognition model consists of two parts of neural network LSTM models, the first neural network LSTM model carries out word segmentation on the crawled text content, and the second neural network LSTM model carries out labeling on the text content after word segmentation. Specifically, when the named-name recognition model is used for labeling text contents, a BIO labeling method is adopted, wherein the BIO labeling method is used for labeling the beginning characters of words with entity significance with B, labeling the middle parts of the words with entity significance with I, and labeling non-entity significance with O, so that complete words with entity significance are obtained. In this step, after the business text is labeled by using the BIO labeling method, the pre-trained proper name recognition model extracts the words with entity significance in the labeled business text, and the obtained words with entity significance form a key name word set.
304. Acquiring entity relationships among key nouns in a key noun set in a preset information database according to the key noun set;
in the step, an information database is preset, wherein in order to ensure the accuracy of the knowledge graph, when the information database is established, the data sources of the information database are screened, and the data information in the screened data sources is called to establish the information database.
And after the information database is obtained, retrieving and screening the data in the information database according to the key noun set obtained in the previous step to obtain the entity relationship among the key nouns in the key noun set extracted in the previous step, and after the entity relationship among the key nouns is obtained, temporarily storing the obtained data.
In the present proposal, the entity refers to each key noun in the key noun set obtained in the previous step, and the relationship refers to the way in which the data objects are connected with each other, which is also called as relationship; entities have a one-to-one relationship with each other, and possibly a one-to-many relationship. In addition, the entity relationship model also includes an Attribute (Attribute), which refers to a certain characteristic of an entity, and an entity can be characterized by several attributes.
305. Calling a text corpus training set to pre-train the two-channel Transformer model to obtain initial parameters; storing the initial parameters to obtain a Bert pre-training model;
specifically, before training the text classification model, a Bert pre-training model needs to be obtained in advance. The Bert pre-training model can directly obtain a pre-trained open-source Bert pre-training model on a network, and can also be automatically trained to obtain the Bert pre-training model.
Wherein, Bert is specifically Bidirectional Encoder responses from transforms, which means a Bidirectional encoding characterization model derived from transforms models.
Specifically, when the Bert pre-training model is obtained through training, the Transformer model can be trained by adopting a text corpus training set pre-established in the previous step, specifically, the Transformer model is trained by a method of randomly covering words in an original corpus data set (MASK), so as to obtain initial parameters, and the initial parameters are stored, so as to obtain the Bert pre-training model.
306. Collecting the service corpora of the selected service types, and labeling service classification labels on the service corpora to obtain a corpus classification training set; acquiring a Bert pre-training model, taking a corpus classification training set as a newly added input vector of the Bert pre-training model, and performing fine-tuning training on the Bert pre-training model to obtain a trained text classification model;
and establishing a corpus classification training set in advance. Specifically, the business corpora are collected first, and labels are carried out on the business corpora to obtain a corpus classification training set.
And acquiring a Bert pre-training model, and inputting the corpus classification training set into the Bert pre-training model for Fine-tuning training (Fine-tuning).
In the proposal, since the text classification model is called to specifically classify and recognize texts in specific services, after the service corpora are collected and the service corpora are labeled to obtain a corpus classification training set, the corps classification training set is used to perform Fine-tuning training (Fine-tuning) on the Bert pre-training model, so that the model can distinguish whether the texts belong to specific service types, thereby obtaining the trained text classification model.
307. Calling a pre-established text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result;
after the entity relationship among the key nouns in the key noun set is obtained, the obtained key nouns and the entity relationship are input into a pre-established text classification model for processing, wherein the text classification model can be established based on a deep learning algorithm. And analyzing the relevance of the obtained key nouns and entity relations with corresponding services by using the text classification model.
For example, when a business knowledge graph in the insurance knowledge field needs to be established, after the related insurance keywords and the entity relationship therebetween are obtained in the foregoing steps, a text classification model is invoked to analyze the key nouns and the entity relationship therebetween, specifically analyze whether the key nouns are related to the insurance field, and obtain an analysis result. The analysis result includes a judgment of whether the two are related and a related degree index.
308. And according to the analysis result, removing key nouns irrelevant to the selected service type from the key noun set, and deleting the corresponding entity relation to obtain a knowledge graph of the selected service type.
In the step of obtaining the key nouns, only noun extraction is performed on the obtained service texts to obtain the key nouns, but since some key nouns have multiple meanings, the entity relationship may include a plurality of entity relationship contents irrelevant to the service for establishing the map. Therefore, according to the analysis result obtained in the above step, the key nouns irrelevant to the selected service type are removed from the key noun set, and the corresponding entity relationship is deleted.
In addition, according to the correlation degree index in the analysis result, the corresponding key nouns and entity relations are sorted according to the priority. And constructing a knowledge graph of the selected service type according to the relation degree index of the key noun set after the irrelevant key nouns are removed and the remaining relevant entity relations.
The embodiment of the invention can collect the service knowledge of the required field, automatically construct the knowledge graph aiming at the required field, improve the knowledge correlation degree of the knowledge graph in the required service field and simplify the construction operation of the knowledge graph.
Referring to fig. 4, another embodiment of the method for constructing a knowledge graph according to an embodiment of the present invention includes:
401. sending a source code acquisition request to a target website in a service database, and reading a source code of the target website after the source code acquisition request passes;
in order to complete the automatic construction of the knowledge graph, firstly, a service database is obtained, wherein the service database comprises service texts of different types of services. Specifically, the service database may be composed by collecting relevant websites, for example, the insurance-related websites include insurance industry related forums, news websites, or insurance industry related subarea blocks in portal websites, and after collecting these websites, these websites are categorized and recorded with their relevant URLs (uniform resource locators), wherein the URL is a representation method for specifying information locations on the web service program of the internet; these web sites and associated URLs are organized into a business database. And automatic search can be performed by using keywords and the like, and a service database can be obtained after search and sorting.
After a service database is obtained, sending a source code obtaining request to a target website in the service database, wherein the source code obtaining request is sent based on an HTTP request, the HTTP request comprises a request head, and a request for obtaining the source code is initiated to the target website according to information preset in the request head. And after the source code acquisition request passes, reading the source code of the target website.
402. Downloading page data in the target website according to the source code of the target website; identifying the content in the page data to obtain service texts of different types of services; (ii) a
And after the source code of the target website is read, calling a downloading function to download and store the page data in the target website. After the page data is stored locally, identifying the page data, ignoring irrelevant data in the page, only reserving a structured text, a semi-structured text and an unstructured text in the page data, and enabling the structured text, the semi-structured text and the unstructured text to form a service text.
Specifically, the structured text and the semi-structured text are mainly derived from tables and the like in the page data, and the unstructured text is mainly derived from text contents in the page data.
403. Calling a pre-established proper name recognition model, and extracting key nouns from various types of service texts to obtain a key noun set;
after a certain amount of service texts of various types are obtained, noun extraction is carried out on the service texts of various types by utilizing a pre-established special name recognition model.
Inputting the crawled service text into a pre-established proper name recognition model, wherein the proper name recognition model consists of two parts of neural network LSTM models, the first neural network LSTM model carries out word segmentation on the crawled text content, and the second neural network LSTM model carries out labeling on the text content after word segmentation. Specifically, when the named-name recognition model is used for labeling text contents, a BIO labeling method is adopted, wherein the BIO labeling method is used for labeling the beginning characters of words with entity significance with B, labeling the middle parts of the words with entity significance with I, and labeling non-entity significance with O, so that complete words with entity significance are obtained. In this step, after the business text is labeled by using the BIO labeling method, the pre-trained proper name recognition model extracts the words with entity significance in the labeled business text, and the obtained words with entity significance form a key name word set.
404. Acquiring at least one knowledge resource website; calling an information crawler tool to crawl at least one knowledge resource website to obtain data information in the at least one knowledge resource website; constructing an information database according to the data information;
the method comprises the steps of obtaining at least one knowledge resource website, calling an information crawler tool to crawl the obtained knowledge resource website to obtain data information in the at least one knowledge resource website, storing the data information, and constructing an information database so as to obtain entity relationships among key nouns in a preset information database according to the key nouns.
Specifically, the knowledge resource websites adopted in the information database may include websites such as Baidu encyclopedia, Chinese knowledge network, MBA Chinesia library, and the like, and the information database is established by using the contents of the websites, so that the authority of the information can be ensured, the information in the subsequently generated knowledge graph is more credible, and the knowledge quality of the knowledge graph established in the proposal is further improved.
405. Acquiring entity relationships among key nouns in a key noun set in a preset information database according to the key noun set;
in the step, an information database is preset, wherein in order to ensure the accuracy of the knowledge graph, when the information database is established, the data sources of the information database are screened, and the data information in the screened data sources is called to establish the information database.
And after the information database is obtained, retrieving and screening the data in the information database according to the key noun set obtained in the previous step to obtain the entity relationship among the key nouns in the key noun set extracted in the previous step, and after the entity relationship among the key nouns is obtained, temporarily storing the obtained data.
In the present proposal, the entity refers to each key noun in the key noun set obtained in the previous step, and the relationship refers to the way in which the data objects are connected with each other, which is also called as relationship; entities have a one-to-one relationship with each other, and possibly a one-to-many relationship. In addition, the entity relationship model also includes an Attribute (Attribute), which refers to a certain characteristic of an entity, and an entity can be characterized by several attributes.
406. Calling a text corpus training set to pre-train the two-channel Transformer model to obtain initial parameters; storing the initial parameters to obtain a Bert pre-training model;
specifically, before training the text classification model, a Bert pre-training model needs to be obtained in advance. The Bert pre-training model can directly obtain a pre-trained open-source Bert pre-training model on a network, and can also be automatically trained to obtain the Bert pre-training model.
Wherein, Bert is specifically Bidirectional Encoder responses from transforms, which means a Bidirectional encoding characterization model derived from transforms models.
Specifically, when the Bert pre-training model is obtained through training, the Transformer model can be trained by adopting a text corpus training set pre-established in the previous step, specifically, the Transformer model is trained by a method of randomly covering words in an original corpus data set (MASK), so as to obtain initial parameters, and the initial parameters are stored, so as to obtain the Bert pre-training model.
407. Collecting the service corpora of the selected service types, and labeling service classification labels on the service corpora to obtain a corpus classification training set; acquiring a Bert pre-training model, taking a corpus classification training set as a newly added input vector of the Bert pre-training model, and performing fine-tuning training on the Bert pre-training model to obtain a trained text classification model;
and establishing a corpus classification training set in advance. Specifically, the business corpora are collected first, and labels are carried out on the business corpora to obtain a corpus classification training set.
And acquiring a Bert pre-training model, and inputting the corpus classification training set into the Bert pre-training model for Fine-tuning training (Fine-tuning).
In the proposal, since the text classification model is required to be called to specifically classify and recognize texts in specific services, after the service corpora are collected and the service corpora are labeled to obtain a corpus classification training set, the corpus classification training set is called to perform Fine-tuning training (Fine-tuning) on the Bert pre-training model, so that the model can distinguish whether the texts belong to specific service types, thereby obtaining the trained text classification model.
408. Calling a pre-established text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result;
after the entity relationship among the key nouns in the key noun set is obtained, the obtained key nouns and the entity relationship are input into a pre-established text classification model for processing, wherein the text classification model can be established based on a deep learning algorithm. And calling the text classification model to analyze the relevance of the obtained key nouns and entity relations with corresponding services.
For example, when a business knowledge graph in the insurance knowledge field needs to be established, after the related insurance keywords and the entity relationship therebetween are obtained in the foregoing steps, a text classification model is invoked to analyze the key nouns and the entity relationship therebetween, specifically analyze whether the key nouns are related to the insurance field, and obtain an analysis result. The analysis result includes a judgment of whether the two are related and a related degree index.
409. According to the analysis result, removing key nouns irrelevant to the selected service type from the key noun set, and deleting the corresponding entity relation to obtain a knowledge graph of the selected service type;
in the step of obtaining the key nouns, only noun extraction is performed on the obtained service texts to obtain the key nouns, but since some key nouns have multiple meanings, the entity relationship may include a plurality of entity relationship contents irrelevant to the service for establishing the map. Therefore, according to the analysis result obtained in the above step, the key nouns irrelevant to the selected service type are removed from the key noun set, and the corresponding entity relationship is deleted.
In addition, according to the correlation degree index in the analysis result, the corresponding key nouns and entity relations are sorted according to the priority. And constructing a knowledge graph of the selected service type according to the relation degree index of the key noun set after the irrelevant key nouns are removed and the remaining relevant entity relations.
410. And updating the knowledge graph based on the received knowledge graph updating request.
Based on the received knowledge map updating request, calling an information crawler tool to acquire newly added service texts of different types of services from a service database; specifically, the newly added text is judged by detecting the URL of the website in the service database, and in this step, only the text content on the webpage corresponding to the URL of the newly added related website is extracted, instead of extracting all the text information of the newly added related website, so that the acquisition of the repeated content is greatly reduced.
Inputting the newly added service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a newly added key noun set;
acquiring entity relationships among all newly added key nouns in the newly added key noun set in a preset information database according to the newly added key noun set;
calling a pre-established text classification model, and analyzing the correlation between the newly-added key nouns and the corresponding entity relationship and the selected service type to obtain an analysis result;
and according to the analysis result, removing the newly added key nouns irrelevant to the selected service type from the newly added key noun set, deleting the corresponding entity relationship, and updating the knowledge graph of the selected service type.
The embodiment of the invention can automatically collect the business knowledge of the required field, automatically construct the knowledge graph aiming at the required field, improve the knowledge correlation degree of the knowledge graph in the required business field and simplify the construction operation of the knowledge graph; in addition, the knowledge graph can be updated according to the received graph updating request, so that the knowledge graph constructed in the embodiment can update the knowledge graph content along with the updating and expansion of the knowledge in the field.
In the above description of the method for constructing a knowledge graph in the embodiment of the present invention, referring to fig. 5, a knowledge graph constructing apparatus in the embodiment of the present invention is described below, and an embodiment of the knowledge graph constructing apparatus in the embodiment of the present invention includes:
a text obtaining module 501, configured to invoke an information crawler tool to obtain service texts of different types of services from a service database;
a noun extraction module 502, configured to input the service text of each type of service into a pre-established proper name recognition model, and extract a key noun to obtain a key noun word set;
an entity relationship obtaining module 503, configured to obtain, according to the key term set, an entity relationship between key terms in the key term set in a preset information database;
a correlation analysis module 504, configured to invoke a pre-established text classification model, and analyze the correlation between the key nouns and the corresponding entity relationships and the selected service types to obtain an analysis result;
a knowledge graph establishing module 505, configured to, according to the analysis result, remove the key nouns that are irrelevant to the selected service type from the key noun set, and delete the corresponding entity relationship, so as to obtain a knowledge graph of the selected service type.
The embodiment of the invention can automatically collect the business knowledge of the required field, automatically construct the knowledge graph aiming at the required field, improve the knowledge correlation degree of the knowledge graph in the required business field and simplify the construction operation of the knowledge graph.
Referring to fig. 6, another embodiment of the knowledge-graph constructing apparatus according to the embodiment of the present invention includes:
a text obtaining module 501, configured to invoke an information crawler tool to obtain service texts of different types of services from a service database;
a noun extraction module 502, configured to input the service text of each type of service into a pre-established proper name recognition model, and extract a key noun to obtain a key noun word set;
an entity relationship obtaining module 503, configured to obtain, according to the key term set, an entity relationship between key terms in the key term set in a preset information database;
a correlation analysis module 504, configured to invoke a pre-established text classification model, and analyze the correlation between the key nouns and the corresponding entity relationships and the selected service types to obtain an analysis result;
a knowledge graph establishing module 505, configured to, according to the analysis result, remove the key nouns that are irrelevant to the selected service type from the key noun set, and delete the corresponding entity relationship, so as to obtain a knowledge graph of the selected service type.
Optionally, the text obtaining module 501 includes:
a source code obtaining unit 5011, configured to send a source code obtaining request to a target website in the service database, where the source code obtaining request reads a source code of the target website after passing through the source code obtaining request;
a data downloading unit 5012, configured to download page data in the target website according to the target website source code;
the service text recognition unit 5013 is configured to recognize content in the page data to obtain service texts of different types of services.
Optionally, the knowledge graph constructing apparatus further includes a proper name recognition training module 506, where the proper name recognition training module 506 specifically includes:
the text corpus collecting unit is used for collecting text corpus information;
the text corpus training set construction unit is used for labeling words in the text corpus information, and performing sentence division and recombination on the labeled text corpus information to obtain a text corpus training set;
and the training unit is used for calling the text corpus training set to train the deep learning model to obtain a proper name recognition model.
Optionally, the knowledge graph constructing apparatus further includes an information database constructing module 507, where the information database constructing module 507 includes:
the knowledge resource website acquisition unit is used for acquiring at least one knowledge resource website;
the data information crawling unit is used for calling an information crawler tool to crawl the at least one knowledge resource website to obtain data information in the at least one knowledge resource website, wherein the knowledge resource website comprises Baidu encyclopedia, Chinese knowledge network and MBA (Business based on Association) Chinesemedicine;
and the information database construction unit is used for constructing an information database according to the data information.
Optionally, the knowledge graph constructing apparatus further includes a Bert fine-tuning training module 508, where the Bert fine-tuning training module 508 specifically includes:
a service corpus collecting unit, configured to collect service corpora of the selected service type, label service classification labels to the service corpora, and obtain a corpus classification training set;
and the fine tuning training unit is used for obtaining a Bert pre-training model, using the corpus classification training set as a newly added input vector of the Bert pre-training model, and performing fine tuning training on the Bert pre-training model to obtain a trained text classification model.
Optionally, the knowledge graph constructing apparatus further includes a Bert pre-training module 509, where the Bert pre-training module is specifically configured to:
calling the text corpus training set to pre-train a two-channel Transformer model to obtain initial parameters; and storing the initial parameters to obtain a Bert pre-training model.
Optionally, the knowledge-graph constructing apparatus further includes a knowledge-graph updating module, where the knowledge-graph updating module is specifically configured to:
based on the received knowledge map updating request, calling an information crawler tool to acquire newly added service texts of different types of services from a service database; inputting the newly added service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a newly added key noun set; acquiring entity relationships among all newly added key nouns in the newly added key noun set in a preset information database according to the newly added key noun set; calling a pre-established text classification model, and analyzing the correlation between the newly-added key nouns and the corresponding entity relationship and the selected service type to obtain an analysis result; and according to the analysis result, removing the newly added key nouns irrelevant to the selected service type from the newly added key noun set, deleting the corresponding entity relationship, and updating the knowledge graph of the selected service type.
The embodiment of the invention can automatically collect the business knowledge of the required field, automatically construct the knowledge graph aiming at the required field, improve the knowledge correlation degree of the knowledge graph in the required business field and simplify the construction operation of the knowledge graph; in addition, the knowledge graph can be updated according to the received graph updating request, so that the knowledge graph constructed in the embodiment can update the knowledge graph content along with the updating and expansion of the knowledge in the field.
The above fig. 5 and fig. 6 describe the knowledge graph constructing apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the following describes the knowledge graph constructing apparatus in the embodiment of the present invention in detail from the perspective of the hardware processing.
Fig. 7 is a schematic structural diagram of a knowledge graph constructing apparatus 700 according to an embodiment of the present invention, which may have relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 710 (e.g., one or more processors) and a memory 720, one or more storage media 730 (e.g., one or more mass storage devices) for storing applications 733 or data 732. Memory 720 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 730 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the knowledge-graph building apparatus 700. Still further, the processor 710 may be configured to communicate with the storage medium 730 to execute a series of instruction operations in the storage medium 730 on the knowledge graph building apparatus 700.
The knowledge graph building apparatus 700 may also include one or more power supplies 740, one or more wired or wireless network interfaces 750, one or more input-output interfaces 760, and/or one or more operating systems 731, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the configuration of the knowledge-graph building apparatus shown in FIG. 7 does not constitute a limitation of the knowledge-graph building apparatus and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
The present invention also provides a knowledge-graph building apparatus, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the knowledge-graph building method in the above embodiments.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method of knowledge-graph construction.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A knowledge graph construction method is characterized by comprising the following steps:
calling an information crawler tool to acquire a service text of the selected service type from a service database;
inputting the service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a key noun set;
acquiring entity relationships among key nouns in the key noun set in a preset information database according to the key noun set;
calling a pre-established text classification model, and analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result, wherein the analysis result comprises correlation degree indexes between the key nouns and the selected service types;
according to the analysis result, removing key nouns irrelevant to the selected service type from the key noun set, and deleting corresponding entity relations;
and constructing the knowledge graph of the selected service type according to the correlation degree index by using the key noun set after the irrelevant key nouns are removed and the rest relevant entity relations.
2. The method of knowledge-graph construction according to claim 1, wherein said invoking an information crawler tool to obtain service texts of different selected service types from a service database comprises:
sending a source code acquisition request to a target website in the service database, and reading a source code of the target website after the source code acquisition request passes;
downloading page data in the target website according to the source code of the target website;
and identifying the content in the page data to obtain the service text of the selected service type.
3. The method of claim 2, wherein before inputting the service text of each service type into a pre-established proper name recognition model and extracting key nouns to obtain a key noun set, the method further comprises:
collecting text corpus information;
labeling words in the text corpus information, and performing sentence segmentation and recombination on the labeled text corpus information to obtain a text corpus training set;
and calling the text corpus training set to train the deep learning model to obtain a proper name recognition model.
4. The knowledge graph construction method according to claim 3, wherein before invoking a pre-established text classification model, analyzing the correlation between the key nouns and the corresponding entity relations and the corresponding services, and obtaining an analysis result, the method further comprises:
collecting the service corpora of the selected service type, and labeling service classification labels on the service corpora to obtain a corpus classification training set;
and acquiring a Bert pre-training model, taking the corpus classification training set as a newly added input vector of the Bert pre-training model, and performing fine-tuning training on the Bert pre-training model to obtain a trained text classification model.
5. The method for constructing a knowledge graph according to claim 4, wherein before the obtaining of the Bert pre-training model, the performing fine-tuning training on the Bert pre-training model by using the corpus classification training set as a new input vector of the Bert pre-training model to obtain the trained text classification model, further comprises:
calling the text corpus training set to pre-train a two-channel Transformer model to obtain initial parameters;
and storing the initial parameters to obtain a Bert pre-training model.
6. The method for constructing a knowledge graph according to claim 4, wherein before the obtaining entity relationships between the key terms in the key term set in a preset information database according to the key term set, the method further comprises:
acquiring at least one knowledge resource website, wherein the knowledge resource website comprises Baidu encyclopedia, Chinese knowledge network and MBA Chinesia;
calling an information crawler tool to crawl the knowledge resource website of the at least one knowledge resource website to obtain data information in the knowledge resource website of the at least one knowledge resource website;
and constructing an information database according to the data information.
7. The method of knowledge-graph construction according to any of claims 1-6, further comprising, after said constructing a knowledge-graph of said selected business type:
based on the received knowledge map updating request, calling an information crawler tool to acquire newly added service texts of different types of services from a service database;
inputting the newly added service text of each type of service into a pre-established special name recognition model, and extracting key nouns to obtain a newly added key noun set;
acquiring entity relationships among all newly added key nouns in the newly added key noun set in a preset information database according to the newly added key noun set;
calling a pre-established text classification model, and analyzing the correlation between the newly-added key nouns and the corresponding entity relationship and the selected service type to obtain an analysis result;
and according to the analysis result, removing the newly added key nouns irrelevant to the selected service type from the newly added key noun set, deleting the corresponding entity relationship, and updating the knowledge graph of the selected service type.
8. A knowledge-graph building apparatus, characterized in that the knowledge-graph building apparatus comprises:
the text acquisition module is used for calling an information crawler tool to acquire the service text of the selected service type from the service database;
a noun extraction module, which inputs the service text of each type of service into a pre-established special name recognition model to extract key nouns and obtain a key noun word set;
the entity relationship acquisition module is used for inputting the service texts of the services of all types into a pre-established special name recognition model, and extracting key nouns to obtain a key noun set;
the correlation analysis module is used for calling a pre-established text classification model, analyzing the correlation between the key nouns and the corresponding entity relations and the selected service types to obtain an analysis result, wherein the analysis result comprises correlation degree indexes between the key nouns and the selected service types;
a knowledge graph establishing module, configured to remove key nouns irrelevant to the selected service type from the key noun set according to the analysis result, and delete corresponding entity relationships; and constructing the knowledge graph of the selected service type according to the correlation degree index by using the key noun set after the irrelevant key nouns are removed and the rest relevant entity relations.
9. A knowledge-graph building apparatus, characterized in that the knowledge-graph building apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invokes the instructions in the memory to cause the knowledge-graph building apparatus to perform the steps of the knowledge-graph building method of any one of claims 1-7.
10. A computer readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of the knowledge-graph construction method according to any one of claims 1-7.
CN202011635788.5A 2020-12-31 2020-12-31 Knowledge graph construction method, device, equipment and storage medium Active CN112749284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011635788.5A CN112749284B (en) 2020-12-31 2020-12-31 Knowledge graph construction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011635788.5A CN112749284B (en) 2020-12-31 2020-12-31 Knowledge graph construction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112749284A CN112749284A (en) 2021-05-04
CN112749284B true CN112749284B (en) 2021-12-17

Family

ID=75650969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011635788.5A Active CN112749284B (en) 2020-12-31 2020-12-31 Knowledge graph construction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112749284B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377957B (en) * 2021-07-01 2022-09-30 浙江工业大学 National economy industry classification method and system based on knowledge graph
CN113987146B (en) * 2021-10-22 2023-01-31 国网江苏省电力有限公司镇江供电分公司 Dedicated intelligent question-answering system of electric power intranet
CN114721833B (en) * 2022-05-17 2022-08-23 中诚华隆计算机技术有限公司 Intelligent cloud coordination method and device based on platform service type
CN115098755A (en) * 2022-06-20 2022-09-23 国网甘肃省电力公司电力科学研究院 Scientific and technological information service platform construction method and scientific and technological information service platform
CN115759256A (en) * 2022-11-24 2023-03-07 中安华邦(北京)安全生产技术研究院股份有限公司 Method, system, medium and equipment for constructing safety production digital knowledge base
CN116401375B (en) * 2023-03-23 2024-02-20 深圳宏鹏数字供应链管理有限公司 Knowledge graph construction method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488724A (en) * 2013-09-16 2014-01-01 复旦大学 Book-oriented reading field knowledge map construction method
CN106126503A (en) * 2016-07-12 2016-11-16 海信集团有限公司 Business scope localization method and terminal
CN106776881A (en) * 2016-11-28 2017-05-31 中国科学院软件研究所 A kind of realm information commending system and method based on microblog
CN109597894A (en) * 2018-09-30 2019-04-09 阿里巴巴集团控股有限公司 A kind of correlation model generation method and device, a kind of data correlation method and device
CN109766445A (en) * 2018-12-13 2019-05-17 平安科技(深圳)有限公司 A kind of knowledge mapping construction method and data processing equipment
CN111444353A (en) * 2020-04-03 2020-07-24 杭州叙简科技股份有限公司 Construction and use method of warning situation knowledge graph
CN111831833A (en) * 2020-07-27 2020-10-27 人民卫生电子音像出版社有限公司 Knowledge graph construction method and device
CN111967263A (en) * 2020-07-30 2020-11-20 北京明略软件系统有限公司 Domain named entity denoising method and system based on entity topic relevance

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488724A (en) * 2013-09-16 2014-01-01 复旦大学 Book-oriented reading field knowledge map construction method
CN106126503A (en) * 2016-07-12 2016-11-16 海信集团有限公司 Business scope localization method and terminal
CN106776881A (en) * 2016-11-28 2017-05-31 中国科学院软件研究所 A kind of realm information commending system and method based on microblog
CN109597894A (en) * 2018-09-30 2019-04-09 阿里巴巴集团控股有限公司 A kind of correlation model generation method and device, a kind of data correlation method and device
CN109766445A (en) * 2018-12-13 2019-05-17 平安科技(深圳)有限公司 A kind of knowledge mapping construction method and data processing equipment
CN111444353A (en) * 2020-04-03 2020-07-24 杭州叙简科技股份有限公司 Construction and use method of warning situation knowledge graph
CN111831833A (en) * 2020-07-27 2020-10-27 人民卫生电子音像出版社有限公司 Knowledge graph construction method and device
CN111967263A (en) * 2020-07-30 2020-11-20 北京明略软件系统有限公司 Domain named entity denoising method and system based on entity topic relevance

Also Published As

Publication number Publication date
CN112749284A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN112749284B (en) Knowledge graph construction method, device, equipment and storage medium
CN106599160B (en) Content rule library management system and coding method thereof
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN113822067A (en) Key information extraction method and device, computer equipment and storage medium
CN111831802B (en) Urban domain knowledge detection system and method based on LDA topic model
CN101853300B (en) Method and system for identifying and evaluating video downloading service website
US20090216708A1 (en) Structural clustering and template identification for electronic documents
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
CN105117434A (en) Webpage classification method and webpage classification system
CN113918794B (en) Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
CN113010679A (en) Question and answer pair generation method, device and equipment and computer readable storage medium
CN115687647A (en) Notarization document generation method and device, electronic equipment and storage medium
KR102257139B1 (en) Method and apparatus for collecting information regarding dark web
CN113971398A (en) Dictionary construction method for rapid entity identification in network security field
CN112035723A (en) Resource library determination method and device, storage medium and electronic device
CN110457603B (en) User relationship extraction method and device, electronic equipment and readable storage medium
CN113806647A (en) Method for identifying development framework and related equipment
CN109948015B (en) Meta search list result extraction method and system
CN112087473A (en) Document downloading method and device, computer readable storage medium and computer equipment
Mohsen et al. Enhancing bug localization using phase-based approach
Castellano et al. A web text mining flexible architecture
Naik et al. An adaptable scheme to enhance the sentiment classification of Telugu language
Yin et al. Research of integrated algorithm establishment of a spam detection system
Ma et al. API prober–a tool for analyzing web API features and clustering web APIs
Bosse et al. Web Data Mining 1: Collecting textual data from web pages using R

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant