CN115658931B - Encyclopedic knowledge graph dynamic updating method, device, equipment and medium - Google Patents

Encyclopedic knowledge graph dynamic updating method, device, equipment and medium Download PDF

Info

Publication number
CN115658931B
CN115658931B CN202211681737.5A CN202211681737A CN115658931B CN 115658931 B CN115658931 B CN 115658931B CN 202211681737 A CN202211681737 A CN 202211681737A CN 115658931 B CN115658931 B CN 115658931B
Authority
CN
China
Prior art keywords
updated
knowledge graph
text
updating
link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211681737.5A
Other languages
Chinese (zh)
Other versions
CN115658931A (en
Inventor
侯磊
张益�
孟斌杰
刘丁枭
逄凡
李涓子
张鹏
唐杰
许斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202211681737.5A priority Critical patent/CN115658931B/en
Publication of CN115658931A publication Critical patent/CN115658931A/en
Application granted granted Critical
Publication of CN115658931B publication Critical patent/CN115658931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of computers, in particular to a method, a device, equipment and a medium for dynamically updating an encyclopedic knowledge graph, wherein the method comprises the following steps: acquiring a data list to be updated of the encyclopedic knowledge graph, wherein the data list to be updated comprises page texts of entries to be updated and/or page texts linked in entry blurb; traversing the data list to be updated according to a preset updating period, extracting preset key information of a page text in the data list to be updated, and updating a triple in the encyclopedic knowledge graph based on the preset key information to obtain an updating result; and structuring the updating result to obtain an updated encyclopedic knowledge graph, and updating the encyclopedic knowledge graph stored in the database based on the updated encyclopedic knowledge graph. Therefore, the problems that the encyclopedic knowledge graph cannot be automatically updated and maintained due to the fact that manual modification, audit and updating are needed in the related technology, updating efficiency is low, maintenance cost is high and the like are solved.

Description

Encyclopedic knowledge graph dynamic updating method, device, equipment and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for dynamically updating an encyclopedic knowledge graph.
Background
The knowledge graph is used for describing various entities and concepts existing in the real world and the relationship among the entities and the concepts, expressing the information of the internet into the form of human cognitive world and providing the capability of better organizing, managing and understanding the mass information of the internet. (ii) a The knowledge graph expresses knowledge in a structured mode, and downstream natural language processing tasks such as searching, question answering, reading understanding and the like can be well achieved. Therefore, the accuracy, correctness and timeliness of knowledge in the knowledge graph are of great importance.
In the real world, knowledge is continuously updated and increased, and a knowledge map is also required to be updated correspondingly. Aiming at large-scale encyclopedic knowledge graphs of tens of millions or even hundreds of millions, the manual modification, examination and maintenance efficiency is low, the cost is high, and the automatic updating of the knowledge graphs is an economic and efficient mode.
Disclosure of Invention
With the development and application of artificial intelligence and big data, the knowledge graph is widely applied as a main method for storing knowledge, important knowledge sources in some general fields can provide valuable resources for the construction of the knowledge graph, and the characteristic of public maintenance ensures the timely update of the knowledge. Many knowledge-graph products therefore utilize it as an important knowledge source, such as XLORE (Cross-language knowledge-graph) and the like.
In the field of modern artificial intelligence, most algorithms are trained on the basis of huge knowledge bases, so that the final effect of artificial intelligence products is determined to a great extent by the quality and the quantity of knowledge maps used by the algorithms. And along with the change of time, knowledge will be more and more, attribute value will also be constantly changed, and the information that can be used for generating the knowledge map also increases by a large amount, if still use old knowledge to train the model at this moment and will produce certain error, so the dynamic update of knowledge map is indispensable.
The prior knowledge graph still has the following defects:
(1) The current knowledge graph is updated in a mode of regularly and completely crawling data generation again, so that time and labor are wasted, and the updating speed is slow;
(2) Some knowledge maps are updated by using an encyclopedia crawling method, but due to the reasons of case disambiguation, updating efficiency and the like, the knowledge maps can only be manually updated in a background after being applied by a user;
(3) Only certain fixed attributes of the instances in the knowledge-graph are updated, and therefore, the method can be only used for information updating in a specific field;
(4) Only the triples in the information frame infobox are extracted, and no further processing is performed on the information in the texts such as the brief introduction and the text, so that waste is caused;
(5) The maintenance cost is high. Most knowledge-graphs require a specialized set of projects for periodic maintenance.
Therefore, the embodiment of the application provides a method, a device, equipment and a medium for dynamically updating an encyclopedic knowledge graph, so as to solve the problems that the updating efficiency is low and the maintenance cost is high due to the fact that the encyclopedic knowledge graph cannot be automatically updated and maintained in the related technology; the problems of resource waste, limitation of updating range and the like are caused by further updating processing of relevant information in the text to be updated.
An embodiment of a first aspect of the present application provides a method for dynamically updating an encyclopedic knowledge graph, including the following steps: acquiring a data list to be updated of the encyclopedic knowledge graph, wherein the data list to be updated comprises page texts of entries to be updated and/or page texts linked in entry blurb; traversing the data list to be updated according to a preset updating period, extracting preset key information of a page text in the data list to be updated, and updating the triple in the encyclopedic knowledge graph based on the preset key information to obtain an updating result; structuring the updating result to obtain an updated encyclopedic knowledge graph, and updating the encyclopedic knowledge graph stored in the database based on the updated encyclopedic knowledge graph.
Optionally, the obtaining of the to-be-updated data list of the encyclopedic knowledge graph includes: acquiring a popular entry and/or an appointed entry, wherein the popular entry is an entry with the total browsing times of which is greater than or equal to a first preset time and the modification times of which is greater than or equal to a second preset time in a preset time length in a statistical encyclopedic page; identifying links in the popular entries and entry blurbs corresponding to the specified entries, and acquiring entry names, uniform Resource Locators (URL), identifications in an encyclopedia knowledge map and/or historical modification times of the popular entries, the specified entries and the corresponding entries; and acquiring the latest page content of the corresponding entry according to the uniform resource locator url or the identifier, and downloading the latest page content to the data list to be updated if the modification times of the latest page content are greater than the historical modification times.
Optionally, the preset key information includes one or more of a name, a subtitle, a Uniform Resource Locator url (Uniform Resource Locator), a brief introduction, an infobox, a text, and a modification record of an instance, and the updating the triple of the text in the encyclopedic knowledge graph based on the preset key information obtains an update result, including: supplementing the belonging classification of the instance according to the attribute of the instance in the information box infobox; recording the link position in the text, generating a link dictionary, converting the link in the text into a link text, and identifying the link text by using a preset symbol; extracting triples in the text, identifying the instance id (Identity document) of the triples and the positions of the triples in the text, and cleaning the triples to obtain cleaned triples; and supplementing the link of the triple by using the link dictionary and the position of the triple field in the cleaned triple in the text, and modifying the example to which the triple belongs to obtain an updating result.
Optionally, the supplementing the link of the triplet with the position of the triplet field in the text in the link dictionary and the cleaned triplet includes: if the subject names of the triples in the same text segment are the same and any triples have links, adding the same links to the same subjects in all the triples in the same text segment; and if the triple subject is consistent with the name of the belonged instance, adding a link of the belonged instance in the subject.
Optionally, the modifying the instance to which the triplet belongs includes: if the link of the triple subject belongs to the current belonged example, or the triple subject does not have the link, the belonged example is not modified; if the triplet subject has a link and is not a belonged instance, the belonged instance is changed to the linked instance of the triplet subject.
Optionally, the structuring the update result to obtain a structured encyclopedia knowledge graph includes: aligning an example to be updated with an example list in the current encyclopedia knowledge graph by using a name, a subheading and a uniform resource locator url, setting the id of the example to be updated to be consistent with the current encyclopedia knowledge graph after aligning, and if the id does not exist, giving a new id; aligning the attributes in the to-be-updated instance with the attribute list in the current encyclopedic knowledge graph by using the names, setting the id of the to-be-updated instance to be consistent with the current encyclopedic knowledge graph after aligning, and if the id does not exist, giving a new id; setting the concept id of the example to be updated to be consistent with the example list in the current encyclopedic knowledge graph; disambiguating using the aligned id instead of the name of the instance, concept, and attribute; generating an upper-layer concept structure through a concept structure, generating an encyclopedic knowledge graph through the information of the updated result, and keeping the data format of the upper-layer concept structure and the data format of the encyclopedic knowledge graph consistent with the data format of the encyclopedic knowledge graph in the database.
The embodiment of the second aspect of the present application provides an encyclopedic knowledge graph dynamic updating apparatus, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a data list to be updated of the encyclopedic knowledge graph, and the data list to be updated comprises page texts of entries to be updated and/or linked page texts in entry blurb; the first updating module is used for traversing the data list to be updated according to a preset updating period, extracting preset key information of a page text in the data list to be updated, and updating a triple in the encyclopedic knowledge graph based on the preset key information to obtain an updating result; and the second updating module is used for structuring the updating result to obtain an updated encyclopedic knowledge graph and updating the encyclopedic knowledge graph stored in the database based on the updated encyclopedic knowledge graph.
Optionally, the obtaining module is further configured to: acquiring a popular entry and/or an appointed entry, wherein the popular entry is an entry with the total browsing times of which is greater than or equal to a first preset time and the modification times of which is greater than or equal to a second preset time in a preset time length in a statistical encyclopedic page; identifying links in the popular entries and entry blurbs corresponding to the specified entries, and acquiring entry names, uniform Resource Locators (URL), identifications in an encyclopedia knowledge map and/or historical modification times of the popular entries, the specified entries and the corresponding entries; and acquiring the latest page content of the corresponding entry according to the uniform resource locator url or the identifier, and downloading the latest page content to the data list to be updated if the modification times of the latest page content are greater than the historical modification times.
Optionally, the first updating module is further configured to: supplementing the belonged classification of the instance according to the attribute of the instance in the information box infobox; recording a link position in a text, generating a link dictionary, converting a link in the text into a link text, and identifying the link text by using a preset symbol; extracting triples in the text, identifying the instance id to which the triples belong and the positions of the triples in the text, and cleaning the triples to obtain cleaned triples; and supplementing the link of the triple by using the link dictionary and the position of the triple field in the text in the cleaned triple, and modifying the example to which the triple belongs to obtain an updating result.
Optionally, the first updating module is further configured to: if the subject names of the triples in the same text segment are the same and any triples have links, adding the same links to the same subjects in all the triples in the same text segment; and if the triple subject is consistent with the name of the belonged instance, adding a link of the belonged instance in the subject.
Optionally, the first updating module is further configured to: if the link of the triple subject belongs to the current belonged example, or the triple subject does not have the link, not modifying the belonged example; if the triplet subject has a link and is not a belonged instance, the belonged instance is changed to the linked instance of the triplet subject.
Optionally, the second updating module is further configured to: aligning an example to be updated with an example list in the current encyclopedia knowledge graph by using a name, a subheading and a uniform resource locator url, setting the id of the example to be updated to be consistent with the current encyclopedia knowledge graph after aligning, and if the id does not exist, giving a new id; aligning the attributes in the to-be-updated instance with the attribute list in the current encyclopedic knowledge graph by using the names, setting the id of the to-be-updated instance to be consistent with the current encyclopedic knowledge graph after aligning, and if the id does not exist, giving a new id; setting the belonged concept id of the to-be-updated example to be consistent with an example list in the current encyclopedic knowledge graph; disambiguating by replacing names of instances, concepts, and attributes with aligned ids; generating an upper-layer concept structure through a concept structure, generating an encyclopedic knowledge graph through the information of the updated result, and keeping the data format of the upper-layer concept structure and the data format of the encyclopedic knowledge graph consistent with the data format of the encyclopedic knowledge graph in the database.
An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the encyclopedic knowledge graph dynamic updating method according to the embodiment.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, and is used to implement the encyclopedic knowledge graph dynamic updating method according to the foregoing embodiment.
Therefore, the application has at least the following beneficial effects:
(1) According to the embodiment of the application, the combination of the browsing times and the modification times in the time period is used as the judgment standard of the timing updating, so that the meaningless webpage scanning is reduced, and the efficiency of the timing updating is improved.
(2) According to the embodiment of the application, the method for updating the hot words and the appointed entries by one key automatically is utilized, the maintenance and updating workload of the knowledge graph is greatly reduced, and meanwhile, manual intervention and supplementation are guaranteed, so that maintenance of the knowledge graph with the billions of data scales is completed by individuals.
(3) According to the embodiment of the application, the link entries in the entry introduction are updated, so that the updating range is more comprehensive.
(4) According to the embodiment of the application, the original content of encyclopedic is updated, the extended content is updated synchronously, links in the triples can be reserved and supplemented, the number of knowledge graphs is increased, and meanwhile entity ambiguity in the triples is reduced.
(5) The embodiment of the application contains knowledge in the general field, and the knowledge range is wider.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a method for dynamically updating an encyclopedic knowledge graph according to an embodiment of the application;
FIG. 2 is a flow chart of dynamic update according to an embodiment of the present application;
FIG. 3 is a block diagram of an apparatus for dynamically updating an encyclopedic knowledge graph according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present application and should not be construed as limiting the present application.
The encyclopedic knowledge graph dynamic updating method, device, equipment and medium of the embodiment of the application are described below with reference to the attached drawings. Specifically, fig. 1 is a schematic flowchart of a method for dynamically updating an encyclopedic knowledge graph according to an embodiment of the present application.
As shown in fig. 1, the encyclopedic knowledge graph dynamic updating method includes the following steps:
in step S101, a list of data to be updated of the encyclopedic knowledge graph is acquired.
The data list to be updated comprises page texts of the entries to be updated and/or linked page texts in the entry blurb.
It can be understood that the embodiment of the application facilitates subsequent updating of the knowledge graph by acquiring the data list to be updated of the encyclopedic knowledge graph.
In the embodiment of the present application, acquiring a to-be-updated data list of an encyclopedic knowledge graph includes: acquiring a popular entry and/or an appointed entry, wherein the popular entry is an entry with the total browsing times of which is greater than or equal to a first preset time in a statistical encyclopedic page, and the modification times of which are greater than or equal to a second preset time in a preset time; identifying links in the entry blurb corresponding to the popular entry and the specified entry, and acquiring the popular entry, the specified entry, an entry name connecting the corresponding entries, a uniform resource locator url, an identifier in an encyclopedia knowledge map and/or historical modification times; and acquiring the latest page content of the corresponding entry according to the uniform resource locator url or the identifier, and downloading the latest page content to a data list to be updated if the modification times of the latest page content are greater than the historical modification times.
The popular terms may be automatically obtained by the system according to a certain filtering rule, and are not specifically limited herein.
The term is designated as a term set according to the modification intention of the user, and may be understood as a term that the user subjectively considers to be required to be updated, which is not specifically limited herein.
The first preset number may be a number set by a user, and may be modified or set according to the user's intention, which is not specifically limited herein.
The preset time duration may be a time duration set by a user, and may be modified or set according to the user intention, which is not specifically limited herein.
The second preset number may be a number set by a user, and may be modified or set according to the user's intention, which is not specifically limited herein.
It can be understood that, in the embodiment of the application, by counting the obtained human entries and/or the specified entries, when the browsing times of the popular entries in the encyclopedic page are greater than or equal to a specific time, and the modification times within a certain time duration are greater than or equal to the entries of the certain time, identifying the popular entries and making links in entry profiles corresponding to the popular entries, and obtaining the popular entries, the specified entries and the entry names, uniform resource locators url, and identifications and/or historical modification times of the encyclopedic knowledge graph connecting the corresponding entries from the popular entries, and obtaining the latest page content of the corresponding entries according to the locators url or the identifications, when the modification times of the latest page content are greater than the historical modification times, downloading the latest page content to a data list to be updated, and using the combination of the browsing times and the modification times within a time period as a decision criterion for timed update reduces meaningless web page scanning, and improves the efficiency of the timed update; by utilizing the method for automatically updating the hot words and the specified entries by one key, the maintenance and updating workload of the knowledge graph is greatly reduced, and meanwhile, the manual intervention and supplementation are ensured, so that the maintenance work of the knowledge graph with the billions of data scales is completed by individuals.
In step S102, the data list to be updated is traversed according to a preset update cycle, preset key information of a page text in the data list to be updated is extracted, and the triple in the encyclopedic knowledge graph is updated based on the preset key information, so as to obtain an update result.
The preset update period may be an update period set by a user, and may be set or modified according to an actual requirement of the user, which is not specifically limited herein.
The preset key information includes one or more of a name, a subtitle, a uniform resource locator url, a brief description, an information frame info box, a text, and a modification record of an example, and it should be understood by those skilled in the art that the preset key information includes but is not limited to the above-mentioned contents, and whether the preset key information is the key information may be determined according to actual situations, and is not specifically limited herein.
The triple refers to the geometric element composition of the graph, the topological relation between the graphs, and the size constraint, which is not specifically limited herein.
It can be understood that the data list to be updated is facilitated through the set period, the key information of the page text in the data list to be updated is extracted, and the triple in the encyclopedic knowledge graph is updated based on the key information to obtain the update result, so that the encyclopedic knowledge graph is further updated subsequently.
In the embodiment of the application, updating the triples of the text in the encyclopedic knowledge graph based on the preset key information to obtain an updating result includes: supplementing the belonging classification of the instance according to the attribute of the instance in the information frame infobox; recording the link position in the text, generating a link dictionary, converting the link in the text into a link text, and identifying the link text by using a preset symbol; extracting triples in the text, identifying instance ids to which the triples belong and positions of triples in the text, and cleaning the triples to obtain cleaned triples; and supplementing the link of the triple by using the link dictionary and the position of the triple field in the cleaned triple in the text, and modifying the example to which the triple belongs to obtain an updating result.
The preset symbol may be a symbol set by a user, for example: some special symbols identify the link text and are not specifically limited herein.
The three groups are cleaned mainly by deleting three groups whose predicates are too long, deleting three groups whose subjects, predicates and objects are special symbols, and the like, and the method is not particularly limited herein.
It can be understood that, according to the attribute of the instance in the information box infobox, the embodiment of the application records the link position in the text of the text to generate a link dictionary, converts the link in the text of the text into a link text, and identifies the text by using a special symbol; and then extracting the triples in the text, identifying the positions of the instance ids and the fields of the triples in the text, cleaning the triples to obtain cleaned triples, supplementing the triples links by using the positions of the fields in the linked dictionaries and the cleaned triples in the text, modifying the instances to obtain an updating result, extracting the triples with the links from the encyclopedic profile by adopting a unique method, redistributing and modifying the instances to synchronously update the extracted triples, retaining and supplementing the links in the triples, increasing the number of knowledge maps and reducing the ambiguity of entities in the triples.
In an embodiment of the present application, the supplementing the link of the triplet by using the location of the triplet field in the text in the link dictionary and the cleaned triplet includes: if the names of the triples in the same text segment are the same and any triples have links, adding the same links to the same subjects in all the triples in the same text segment; if the triple subject is consistent with the name of the belonged instance, the link of the belonged instance is added to the subject.
It can be understood that, in the embodiment of the present application, if the subject names of the triples in the same text are the same and there is a link, the link is added to the same subject of all the triples in the text; if the triples are consistent in subject and instance name, the subject adds the link of the instance, the extended content is synchronously updated while the encyclopedic original content is updated, the link in the triples can be reserved and supplemented, and ambiguity of entities in the triples is reduced.
In the embodiment of the present application, modifying the instance to which the triplet belongs includes: if the link of the triple subject belongs to the current belonged example, or the triple subject does not have the link, not modifying the belonged example; if the triplet subject has a link and is not a belonged instance, then the belonged instance is changed to an instance of the link of the triplet subject.
It can be understood that in the embodiment of the present application, if the link of the triplet subject is the current instance or the triplet subject is not linked, the instance to which the triplet subject belongs needs to be modified; if the triple subject has a link and is not the belonged example, the belonged example is modified into the link example of the triple subject, so that ambiguity of entities in the triples is avoided.
In step S103, the update result is structured to obtain an updated encyclopedic knowledge graph, and the encyclopedic knowledge graph stored in the database is updated based on the updated encyclopedic knowledge graph.
It can be understood that, in the embodiment of the application, the updated encyclopedic knowledge graph is obtained by structuring the update result, the encyclopedic knowledge graph stored in the database is updated based on the updated encyclopedic knowledge graph, manual modification and review are not needed, dynamic update of the encyclopedic knowledge graph can be performed by automatically and periodically updating related entries, the update efficiency is improved, and the maintenance cost is reduced.
In the embodiment of the present application, the step of structuring the update result to obtain a structured encyclopedia knowledge graph includes: aligning an example to be updated with an example list in the current encyclopedia knowledge graph by using a name, a subheading and a uniform resource locator url, setting the id of the example to be updated to be consistent with the current encyclopedia knowledge graph after aligning, and if the id does not exist, giving a new id; aligning the attributes in the to-be-updated instance with the attribute list in the current encyclopedic knowledge graph by using the names, setting the id of the to-be-updated instance to be consistent with the current encyclopedic knowledge graph after aligning, and if the id does not exist, giving a new id; setting the concept id of the example to be updated to be consistent with the example list in the current encyclopedic knowledge graph; disambiguating using the aligned id instead of the name of the instance, concept, and attribute; and generating an upper-layer concept structure through the concept structure, generating an encyclopedia knowledge graph through the updated result information, and keeping the data formats of the upper-layer concept structure and the encyclopedia knowledge graph consistent with the encyclopedia knowledge graph in the database.
The data format may be a ttl format, and may be set or modified according to the actual needs of the user, which is not specifically limited herein.
The database may be a virtuoso database, and may be set or modified according to the actual needs of the user, which is not specifically limited herein.
It can be understood that, in the embodiment of the present application, the name, the subtitle, and the uniform resource locator url are used to align the to-be-updated instance with the instance list in the current encyclopedic knowledge graph, and after the alignment, the id of the to-be-updated instance is set to be consistent with the current encyclopedic knowledge graph; aligning the attribute in the to-be-updated instance with an attribute list in the current encyclopedic knowledge graph by using the name, and setting the id of the to-be-updated instance to be consistent with the current encyclopedic knowledge graph after aligning; setting the belonged concept id of the to-be-updated example to be consistent with an example list in the current encyclopedic knowledge graph; and disambiguates by replacing names of instances, concepts and attributes with aligned ids; and carrying out knowledge modeling and acquisition, keeping the data format in the knowledge modeling and knowledge acquisition consistent with the encyclopedic knowledge map in the database, conveniently storing the data format in the database, and structuring the updating result to obtain the structured encyclopedic knowledge map.
According to the encyclopedic knowledge graph dynamic updating method provided by the embodiment of the application, the data list to be updated of the lesson reporting knowledge graph is obtained, the data list to be updated is traversed according to the set updating period, the key information of the page text in the data list is extracted, the triple in the encyclopedic knowledge graph is updated based on the key information to obtain the updating result, the updating result is structured to obtain the updated encyclopedic knowledge graph, the encyclopedic knowledge graph stored in the database is updated based on the updated encyclopedic indication graph, and the dynamic updating of the encyclopedic knowledge graph is carried out by automatically and periodically updating related vocabulary entries, so that the updating efficiency is improved, and the maintenance cost is reduced. Therefore, the problems that the encyclopedic knowledge graph cannot be automatically updated and maintained due to the fact that the audit update needs to be manually modified and verified in the related technology, updating efficiency is low, maintenance cost is high and the like are solved.
The encyclopedic knowledge graph dynamic updating method will be explained in detail with reference to fig. 2, and the specific steps are as follows:
step 1, acquiring data needing to be updated
1. And counting the entries with the total browsing times more than or equal to a certain number of times and the modification times more than or equal to a fixed number of times in the encyclopedic page, taking the linked entries in the entry texts as popular words, and recording the entry names, the uniform resource locators url and the historical modification times. The entry is divided into fixed parts, one of the fixed parts is scanned every day, the latest corresponding page is found by using the name of the entry and a uniform resource locator url, then the historical modification times of the page are extracted and compared with the historical modification times in the database, and if the historical modification times are changed, the page is downloaded to a list to be updated.
2. If the entry which is not in the popular terms but is to be updated exists, the uniform resource locator url of the encyclopedia or the id in the knowledge map needs to be provided, then the corresponding latest page is found according to the uniform resource locator url or the id, and the page content is downloaded to the list to be updated.
3. And (3) preprocessing the entry pages of the list to be updated in the step (1) and the step (2) to obtain links in the entry introduction, comparing the number of times of modification with the current knowledge map, downloading the content of the entry pages to be updated into the list to be updated, and updating the historical number of times of modification in the database.
Step 2, extracting information of the page and preprocessing the information
1. And traversing the web pages in the list to be updated regularly, and extracting key information such as the names, the subtitles, the uniform resource locators url, the brief introduction, the information frames infobox, the Text, the modification records and the like of the instances from the web pages by using html (Hyper Text Markup Language) tags.
2. Links and paragraphs in the process text are represented using special symbols.
3. The extracted key information is in a unified format, so that subsequent processing is facilitated.
Step 3, supplementing triples not in extended encyclopedia
1. The belonging classification of an instance is supplemented according to the attributes of the instance in the information box infobox.
2. And extracting the open relations in the text by using an open relation extraction method:
2.1 recording the link position in the text to generate a link dictionary.
2.2 the links in the text are changed into the text, and special symbols are converted to facilitate subsequent processing.
And 2.3 extracting the triples in the text, and storing basic information such as instance id of the triples, the position of the triples in the text and the like.
And 2.4, cleaning triple results, such as deleting triples with overlong predicates, deleting triples with special symbols such as subjects, predicates and objects, and the like.
2.5 the links in the triples are supplemented back with the location information in the text of the link dictionary and the triples field.
2.6 complementing the links in the triplets.
2.6.1 if the subject names of the triples in the same text segment are the same and one of the triples has a link, the subject of the name in all the triples extracted from the text segment adds the link.
2.6.2 if the triplet subject is consistent with the belonging instance name, the subject adds a link to the belonging instance.
2.7 modifying the instances to which triplets belong
2.7.1 if the triplet subject is that the link is the current affiliated instance, the affiliated instance is not modified.
2.7.2 if the triplets are not linked, the belonged instance is not modified.
2.7.3 if the triplets have links in the subject and are not belonged instances, then the belonged instances are changed to instances where the subjects are linked.
Step 4, structuring the processing result and generating a knowledge graph
1. The instance to be updated is aligned with the list of instances in the current knowledge graph by name, subheading, uniform resource locator url, and if already present, the instance id to be updated is set to be consistent with the current knowledge graph. If not, a new id is assigned.
2. And aligning the attributes in the to-be-updated instance with the attribute list in the current knowledge graph by using the names, and if the attributes exist, setting the id of the to-be-updated instance to be consistent with the current knowledge graph. If not, a new id is assigned.
3. And setting the belonged concept id of the to-be-updated instance to be consistent with the instance list in the current knowledge graph.
4. The name of the attribute is disambiguated, notionally, with the aligned id instead of the instance.
5. Knowledge modeling: and generating a top and bottom concept hierarchy through the concept structure, wherein the top and bottom concept hierarchy comprises a concept list and a concept top and bottom.
6. Acquiring knowledge: and generating a knowledge graph through the information in the encyclopedia, wherein the knowledge graph comprises information such as an instance list, an attribute list, an instance classification, an instance information box infobox, an instance brief description, an instance body, an instance alias, a related instance, an open relation in a text and the like.
7. The data format in knowledge modeling and knowledge acquisition is consistent with the knowledge map in the database, and the data is conveniently stored in the database.
Step 5, updating the knowledge graph stored in the database
1. And updating the attribute list and adding the newly added attribute.
2. According to the id of the instance to be updated, all information of the instance in the database, such as an instance list, an instance classification, an information box info box, a brief description, a text, an alias, a related instance, an open relation in a text, and the like, is queried and deleted.
3. And (4) all the generated format data are imported into the database to complete updating.
In conclusion, the hot entry is determined by using a calculation method based on the browsing times and the modification times in a period of time, and the browsing times and the modification times in the period of time are combined to be used as a judgment standard of the timing update, so that meaningless webpage scanning is reduced, and the efficiency of the timing update is improved; the method combining automatic periodic updating of popular words and one-key updating of the appointed vocabulary entry is adopted to dynamically update the encyclopedic knowledge graph, so that the maintenance and updating workload of the knowledge graph is greatly reduced, and meanwhile, manual intervention and supplementation are ensured, so that the maintenance work of the knowledge graph with the billions of data scales is completed by individuals; when the entry is updated, the entry which is possibly in close relation with the entry to be updated (possibly changed) is updated at the same time, so that the updating range is more comprehensive; the unique method is adopted to extract the triples with the links from the encyclopedic blurb and redistribute the belonged examples for synchronously updating the extracted triples, the extended contents can be synchronously updated while the original contents of the encyclopedic are updated, the links in the triples can be reserved and supplemented, the number of knowledge maps is increased, and the ambiguity of entities in the triples is reduced; the encyclopedic knowledge graph in the general field is updated, so that the knowledge range is wider.
Next, an encyclopedic knowledge graph dynamic updating device proposed according to an embodiment of the present application is described with reference to the drawings.
Fig. 3 is a schematic block diagram of an encyclopedic knowledge-graph dynamic updating apparatus according to an embodiment of the present application.
As shown in fig. 3, the encyclopedic knowledge-graph dynamic updating device 10 includes: an acquisition module 100, a first update module 200, and a second update module 300.
The obtaining module 100 is configured to obtain a to-be-updated data list of the encyclopedic knowledge graph, where the to-be-updated data list includes a page text of an entry to be updated and/or a page text linked in an entry introduction; the first updating module 200 is configured to traverse the data list to be updated according to a preset updating period, extract preset key information of a page text in the data list to be updated, and update a triple in the encyclopedic knowledge graph based on the preset key information to obtain an updating result; the second updating module 300 is configured to structure the update result to obtain an updated encyclopedic knowledge graph, and update the encyclopedic knowledge graph stored in the database based on the updated encyclopedic knowledge graph.
In an embodiment of the present application, the obtaining module 100 is further configured to: acquiring a popular entry and/or an appointed entry, wherein the popular entry is an entry with the total browsing times of which is greater than or equal to a first preset time in a statistical encyclopedic page, and the modification times of which are greater than or equal to a second preset time in a preset time; identifying links in the entry blurb corresponding to the popular entry and the specified entry, and acquiring the popular entry, the specified entry, an entry name connecting the corresponding entries, a uniform resource locator url, an identifier in an encyclopedia knowledge map and/or historical modification times; and acquiring the latest page content of the corresponding entry according to the uniform resource locator url or the identifier, and downloading the latest page content to a data list to be updated if the modification times of the latest page content are greater than the historical modification times.
In this embodiment, the first update module 200 is further configured to: supplementing the belonging classification of the instance according to the attribute of the instance in the information frame infobox; recording the link position in the text, generating a link dictionary, converting the link in the text into a link text, and identifying the link text by using a preset symbol; extracting triples in the text, identifying instance ids to which the triples belong and positions of the triples in the text, and cleaning the triples to obtain cleaned triples; and utilizing the positions of the triple fields in the link dictionary and the cleaned triple in the text to supplement the connection of the triples, and modifying the examples to which the triples belong to obtain an updating result.
In an embodiment of the present application, the first update module 200 is further configured to: if the subject names of the triples in the same text segment are the same and any triples have links, adding the same links to the same subjects in all the triples in the same text segment; if the triple subject is consistent with the name of the belonged instance, the link of the belonged instance is added to the subject.
In this embodiment, the first update module 200 is further configured to: if the link of the triple subject belongs to the current belonged example, or the triple subject does not have the link, the belonged example is not modified; if the triplet subject has a link and is not a belonged instance, then the belonged instance is changed to the linked instance of the triplet subject.
In an embodiment of the present application, the second updating module 300 is further configured to: aligning an example to be updated with an example list in the current encyclopedia knowledge graph by using a name, a subheading and a uniform resource locator url, setting the id of the example to be updated to be consistent with the current encyclopedia knowledge graph after aligning, and if the id does not exist, giving a new id; aligning the attributes in the example to be updated with the attribute list in the current encyclopedic knowledge graph by using the name, setting the id of the example to be updated to be consistent with the current encyclopedic knowledge graph after aligning, and if the id does not exist, giving a new id; setting the belonged concept id of the to-be-updated example to be consistent with an example list in the current encyclopedic knowledge graph; disambiguating by replacing names of instances, concepts, and attributes with aligned ids; and generating an upper-layer concept structure through the concept structure, generating an encyclopedia knowledge graph through the updated result information, and keeping the data formats of the upper-layer concept structure and the encyclopedia knowledge graph consistent with the encyclopedia knowledge graph in the database.
It should be noted that the above explanation of the embodiment of the encyclopedic knowledge graph dynamic updating method is also applicable to the encyclopedic knowledge graph dynamic updating device of the embodiment, and is not repeated herein.
According to the encyclopedic knowledge graph dynamic updating device provided by the embodiment of the application, the data list to be updated of the lesson reporting knowledge graph is obtained, the data list to be updated is traversed according to the set updating period, the key information of the page text in the data list is extracted, the triple in the encyclopedic knowledge graph is updated based on the key information to obtain the updating result, the updating result is structured to obtain the updated encyclopedic knowledge graph, the encyclopedic knowledge graph stored in the database is updated based on the updated encyclopedic indication graph, and the dynamic updating of the encyclopedic knowledge graph is carried out by automatically and periodically updating related vocabulary entries, so that the updating efficiency is improved, and the maintenance cost is reduced. Therefore, the problems that the encyclopedic knowledge graph cannot be automatically updated and maintained due to the fact that the audit update needs to be manually modified and verified in the related technology, updating efficiency is low, maintenance cost is high and the like are solved.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 401, processor 402, and computer programs stored on memory 401 and operable on processor 402.
The processor 402, when executing the program, implements the method for dynamically updating the encyclopedic knowledge graph provided in the above embodiments.
Further, the electronic device further includes:
a communication interface 403 for communication between the memory 401 and the processor 402.
A memory 401 for storing computer programs executable on the processor 402.
The Memory 401 may include a high-speed RAM (Random Access Memory) Memory, and may also include a non-volatile Memory, such as at least one disk Memory.
If the memory 401, the processor 402 and the communication interface 403 are implemented independently, the communication interface 403, the memory 401 and the processor 402 may be connected to each other through a bus and perform communication with each other. The bus may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 401, the processor 402, and the communication interface 403 are integrated on a chip, the memory 401, the processor 402, and the communication interface 403 may complete mutual communication through an internal interface.
Processor 402 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the above method for dynamically updating an encyclopedic knowledge graph.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a programmable gate array, a field programmable gate array, or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are exemplary and should not be construed as limiting the present application and that changes, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims (7)

1. An encyclopedic knowledge graph dynamic updating method is characterized by comprising the following steps:
acquiring a popular entry and/or an appointed entry, wherein the popular entry is an entry with the total browsing times of which is greater than or equal to a first preset time and the modification times of which is greater than or equal to a second preset time in a preset time length in a statistical encyclopedic page;
identifying links in the popular entries and entry blurbs corresponding to the specified entries, and acquiring entry names, uniform Resource Locators (URL), identifications in an encyclopedia knowledge map and/or historical modification times of the popular entries, the specified entries and the corresponding entries;
acquiring the latest page content of the corresponding entry according to the uniform resource locator url or the identification, and downloading the latest page content to a data list to be updated if the modification times of the latest page content are greater than the historical modification times, wherein the data list to be updated comprises page texts of the entry to be updated and/or linked page texts in the entry introduction;
traversing the data list to be updated according to a preset updating period, extracting preset key information of a page text in the data list to be updated, updating a triple in the encyclopedic knowledge graph based on the preset key information to obtain an updating result, wherein the preset key information comprises one or more of a name, a subtitle, a uniform resource locator url, a brief introduction, an information frame infobox, a text and a modification record of an example, and the triple in the text in the encyclopedic knowledge graph is updated based on the preset key information to obtain the updating result, which comprises the following steps: supplementing the belonging classification of the instance according to the attribute of the instance in the information box infobox; recording the link position in the text, generating a link dictionary, converting the link in the text into a link text, and identifying the link text by using a preset symbol; extracting triples in the text, identifying the instance id to which the triples belong and the positions of the triples in the text, and cleaning the triples to obtain cleaned triples; supplementing the link of the triple by using the link dictionary and the position of the triple field in the cleaned triple in the text, and modifying the example to which the triple belongs to obtain an updating result;
structuring the updating result to obtain an updated encyclopedic knowledge graph, and updating the encyclopedic knowledge graph stored in the database based on the updated encyclopedic knowledge graph.
2. The method of claim 1, wherein supplementing the link of the triplet with the location in text of the triplet field in the link dictionary and the cleaned triplet comprises:
if the subject names of the triples in the same text segment are the same and any triples have links, adding the same links to the same subjects in all the triples in the same text segment;
and if the triple subject is consistent with the name of the belonged instance, adding a link of the belonged instance in the subject.
3. The method of claim 1, wherein modifying the instance to which the triplet belongs comprises:
if the link of the triple subject belongs to the current belonged example, or the triple subject does not have the link, not modifying the belonged example;
if the triplet subject has a link and is not a belonged instance, the belonged instance is changed to the linked instance of the triplet subject.
4. The method of claim 1, wherein the structuring the updated result to obtain a structured encyclopedia knowledge graph comprises:
aligning an example to be updated with an example list in the current encyclopedia knowledge graph by using a name, a subheading and a uniform resource locator url, setting the id of the example to be updated to be consistent with the current encyclopedia knowledge graph after aligning, and if the id does not exist, giving a new id;
aligning the attributes in the to-be-updated instance with the attribute list in the current encyclopedic knowledge graph by using the names, setting the id of the to-be-updated instance to be consistent with the current encyclopedic knowledge graph after aligning, and if the id does not exist, giving a new id;
setting the concept id of the example to be updated to be consistent with the example list in the current encyclopedic knowledge graph;
disambiguating using the aligned id instead of the name of the instance, concept, and attribute;
generating an upper-layer concept structure through a concept structure, generating an encyclopedic knowledge graph through the information of the updated result, and keeping the data format of the upper-layer concept structure and the data format of the encyclopedic knowledge graph consistent with the data format of the encyclopedic knowledge graph in the database.
5. An encyclopedic knowledge graph dynamic updating device is characterized by comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring popular terms and/or designated terms, the popular terms are terms with the total browsing times of the statistical encyclopedic page being more than or equal to a first preset time and the modification times within a preset duration being more than or equal to a second preset time; identifying links in the popular entries and entry blurbs corresponding to the specified entries, and acquiring entry names, uniform Resource Locators (URL), identifications in an encyclopedia knowledge map and/or historical modification times of the popular entries, the specified entries and the corresponding entries; acquiring the latest page content of the corresponding entry according to the uniform resource locator url or the identification, and downloading the latest page content to a data list to be updated if the modification times of the latest page content are greater than the historical modification times, wherein the data list to be updated comprises page texts of the entry to be updated and/or linked page texts in the entry introduction;
a first updating module, configured to traverse the data list to be updated according to a preset updating period, extract preset key information of a page text in the data list to be updated, update a triplet in the encyclopedic knowledge graph based on the preset key information, to obtain an updating result, where the preset key information includes one or more of a name, a subtitle, a uniform resource locator url, a brief description, an information frame info box, a body text, and a modification record of an example, and update the triplet of the text in the encyclopedic knowledge graph based on the preset key information, to obtain the updating result, where the updating module includes: supplementing the belonging classification of the instance according to the attribute of the instance in the information box infobox; recording the link position in the text, generating a link dictionary, converting the link in the text into a link text, and identifying the link text by using a preset symbol; extracting triples in the text, identifying the instance id to which the triples belong and the positions of the triples in the text, and cleaning the triples to obtain cleaned triples; supplementing the link of the triple by using the link dictionary and the position of the triple field in the cleaned triple in the text, and modifying the example to which the triple belongs to obtain an updating result;
and the second updating module is used for structuring the updating result to obtain an updated encyclopedic knowledge graph and updating the encyclopedic knowledge graph stored in the database based on the updated encyclopedic knowledge graph.
6. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the encyclopedic knowledge-graph dynamic update method according to any one of claims 1 to 4.
7. A computer-readable storage medium, on which a computer program is stored, the program being executable by a processor for implementing the method for dynamic update of an encyclopedic knowledge graph according to any one of claims 1 to 4.
CN202211681737.5A 2022-12-27 2022-12-27 Encyclopedic knowledge graph dynamic updating method, device, equipment and medium Active CN115658931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211681737.5A CN115658931B (en) 2022-12-27 2022-12-27 Encyclopedic knowledge graph dynamic updating method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211681737.5A CN115658931B (en) 2022-12-27 2022-12-27 Encyclopedic knowledge graph dynamic updating method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN115658931A CN115658931A (en) 2023-01-31
CN115658931B true CN115658931B (en) 2023-04-07

Family

ID=85023356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211681737.5A Active CN115658931B (en) 2022-12-27 2022-12-27 Encyclopedic knowledge graph dynamic updating method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115658931B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064910A (en) * 2021-09-29 2022-02-18 清华大学 Knowledge graph construction method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
HK1220319A2 (en) * 2016-07-29 2017-04-28 李應樵 Method, system and computer-readable medium for automatic chinese ontology generation based on structured web knowledge
CN108563710B (en) * 2018-03-27 2021-02-02 腾讯科技(深圳)有限公司 Knowledge graph construction method and device and storage medium
CN110019840B (en) * 2018-07-20 2021-06-15 腾讯科技(深圳)有限公司 Method, device and server for updating entities in knowledge graph
CN112905804B (en) * 2021-02-22 2022-08-26 国网电力科学研究院有限公司 Dynamic updating method and device for power grid dispatching knowledge graph

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064910A (en) * 2021-09-29 2022-02-18 清华大学 Knowledge graph construction method and system

Also Published As

Publication number Publication date
CN115658931A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
CN111930966B (en) Intelligent policy matching method and system for digital government affairs
CN106503211B (en) Method for automatically generating mobile version facing information publishing website
CN102651002A (en) Webpage information extracting method and system
CN109657121A (en) A kind of Web page information acquisition method and device based on web crawlers
CN112052414A (en) Data processing method and device and readable storage medium
CN113239111A (en) Network public opinion visual analysis method and system based on knowledge graph
Du et al. Managing knowledge on the Web–Extracting ontology from HTML Web
CN103440315A (en) Web page cleaning method based on theme
CN115358200A (en) Template document automatic generation method based on SysML meta model
CN111652658A (en) Portrait fusion method, apparatus, electronic device and computer readable storage medium
WO2023071127A1 (en) Policy recommended method and apparatus, device, and storage medium
CN116245177A (en) Geographic environment knowledge graph automatic construction method and system and readable storage medium
CN105740355A (en) Aggregated text density based webpage body text extraction method and apparatus
CN113157978B (en) Data label establishing method and device
CN115658931B (en) Encyclopedic knowledge graph dynamic updating method, device, equipment and medium
Liu et al. An XML-enabled data extraction toolkit for web sources
CN112883202A (en) Knowledge graph-based multi-component modeling method and system
CN116244476A (en) Method and system for realizing pre-labeling front-end visualization based on rich text
CN110716913A (en) Mutual migration method for Kafka and Elasticissearch database data
CN116303641A (en) Laboratory report management method supporting multi-data source visual configuration
Borek et al. Information organization and access in digital humanities: TaDiRAH revised, formalized and FAIR
Bürgermeister Extending versioning in collaborative research
Karami et al. Maintaining accurate web usage models using updates from activity diagrams
CN114741077A (en) Page effect preview method, device, equipment and medium based on field granularity
CN114519106A (en) Document level entity relation extraction method and system based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant