CN115203428B - Knowledge graph construction method and device, electronic equipment and storage medium - Google Patents

Knowledge graph construction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115203428B
CN115203428B CN202210604131.5A CN202210604131A CN115203428B CN 115203428 B CN115203428 B CN 115203428B CN 202210604131 A CN202210604131 A CN 202210604131A CN 115203428 B CN115203428 B CN 115203428B
Authority
CN
China
Prior art keywords
data
knowledge graph
sub
industry
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210604131.5A
Other languages
Chinese (zh)
Other versions
CN115203428A (en
Inventor
郑烨翰
陆超
蔡远俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210604131.5A priority Critical patent/CN115203428B/en
Publication of CN115203428A publication Critical patent/CN115203428A/en
Application granted granted Critical
Publication of CN115203428B publication Critical patent/CN115203428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The disclosure provides a knowledge graph construction method, a device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence such as knowledge graph, natural language processing, deep learning and the like, and can be applied to scenes such as intelligent creation and the like. The specific implementation scheme is as follows: analyzing the first document data to obtain various types of data to be processed; matching the data to be processed with industry sample data to obtain entity objects matched with the industry sample data, wherein the entity objects are used for representing classification of industry knowledge; extracting a first sub-object related to the entity object from the data to be processed; and constructing an industry knowledge graph according to the entity object and the first sub-object. By adopting the method and the device, the construction cost of the industry knowledge graph can be reduced.

Description

Knowledge graph construction method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the fields of knowledge graph, natural language processing, deep learning and the like, and can be applied to scenes such as intelligent creation and the like.
Background
The knowledge graph is an important branch technology of artificial intelligence, is used as a structured semantic knowledge base for describing concepts and interrelationships thereof in the physical world in a symbol form, and the basic composition units of the knowledge graph are 'entity-relation-entity' triples, and 'entities' are mutually connected through 'relation', so as to form a net-shaped knowledge structure.
The knowledge graph can be divided into a general knowledge graph and an industry knowledge graph according to functions and application scenes. The general knowledge graph is oriented to the general field, emphasizes the breadth of knowledge, and is structured encyclopedia knowledge, and the user is mainly a common user; the industry knowledge graph is oriented to a specific field, the depth of knowledge is emphasized, the construction is usually needed based on the database of the industry, and the aimed user is a practitioner in the industry, a potential industry person and the like.
In the related technology, in order to construct the industry knowledge graph, a great deal of industry knowledge needs to be deeply understood, a great deal of professional training data is marked, the whole construction process has high cost and long construction period.
Disclosure of Invention
The disclosure provides a knowledge graph construction method, a knowledge graph construction device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a knowledge graph construction method, including:
analyzing the first document data to obtain various types of data to be processed;
matching the data to be processed with industry sample data to obtain entity objects matched with the industry sample data, wherein the entity objects are used for representing classification of industry knowledge;
Extracting a first sub-object related to the entity object from the data to be processed;
and constructing an industry knowledge graph according to the entity object and the first sub-object.
According to another aspect of the present disclosure, there is provided a knowledge graph construction apparatus, including:
the analysis module is used for analyzing the first document data to obtain various types of data to be processed;
the matching module is used for matching the data to be processed with industry sample data to obtain entity objects matched with the industry sample data, and the entity objects are used for representing classification of industry knowledge;
the first extraction module is used for extracting a first sub-object related to the entity object from the data to be processed;
and the first construction module is used for constructing an industry knowledge graph according to the entity object and the first sub-object.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method provided by any one of the embodiments of the present disclosure.
By adopting the method and the device, the first document data is analyzed, various types of data to be processed can be obtained, the data to be processed is matched with the industry sample data, and the entity object matched with the industry sample data can be obtained and used for representing classification of industry knowledge. And extracting a first sub-object related to the entity object from the data to be processed so as to construct an industry knowledge graph according to the entity object and the first sub-object, thereby reducing the construction cost of the industry knowledge graph.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a distributed cluster processing scenario in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow diagram of a knowledge graph construction method, according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram of a knowledge graph construction scenario in an application example, according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of building a multi-tier business architecture based on parsing of document data in an application example in accordance with an embodiment of the present disclosure;
FIG. 5 is a system architecture diagram of building a knowledge-graph and its business applications in an application example in accordance with an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a composition structure of a knowledge graph construction apparatus according to an embodiment of the present disclosure;
fig. 7 is a block diagram of an electronic device for implementing a knowledge graph construction method of an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. The term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, e.g., including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C. The terms "first" and "second" herein mean a plurality of similar technical terms and distinguishes them, and does not limit the meaning of the order, or only two, for example, a first feature and a second feature, which means that there are two types/classes of features, the first feature may be one or more, and the second feature may be one or more.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
Aiming at the construction of industry (or enterprises), a large amount of construction cost is required to be input by adopting related technical means, the current situation that the data quantity and the knowledge quantity are rapidly increased at present cannot be met, and the large-scale landing application is difficult to achieve.
For industries (or enterprises), a large amount of document data with rich content exists, an industry knowledge graph meeting the application scene of each business of the industry is extracted from the document data, and the industry knowledge graph can be constructed only by combining a small amount of industry data samples, so that the construction cost is reduced.
Fig. 1 is a schematic diagram of a distributed cluster processing scenario according to an embodiment of the present disclosure, where the distributed cluster system is an example of a cluster system, and an exemplary description may be that an industry knowledge graph may be constructed by using the distributed cluster system. As shown in fig. 1, the distributed cluster system includes a plurality of nodes (such as a server cluster 101, a server 102, a server cluster 103, a server 104, and a server 105, where the server 105 may be further connected to electronic devices, such as a mobile phone 1051 and a desktop 1052), and the plurality of nodes and the connected electronic devices may perform one or more relationship extraction tasks together. Optionally, a plurality of nodes in the distributed cluster system can execute part or all tasks related to the construction of the industry knowledge graph in a data parallel mode, so that the construction precision and construction efficiency of the industry knowledge graph are improved.
According to an embodiment of the present disclosure, a knowledge graph construction method is provided, and fig. 2 is a schematic flow chart of the knowledge graph construction method according to an embodiment of the present disclosure, where the method may be applied to a knowledge graph construction apparatus, for example, where the apparatus may be deployed in a terminal or a server or other processing devices in a stand-alone, multi-machine or clustered system for execution, and may implement processing such as knowledge graph construction. The terminal may be a User Equipment (UE), a mobile device, a personal digital assistant (PDA, personal Digital Assistant), a handheld device, a computing device, an in-vehicle device, a wearable device, etc. In some possible implementations, the method may also be implemented by way of a processor invoking computer readable instructions stored in a memory. As shown in fig. 2, the method is applied to any node or electronic device (mobile phone or desktop, etc.) in the cluster system shown in fig. 1, and includes:
s201, analyzing the first document data to obtain various types of data to be processed.
S202, matching the data to be processed with industry sample data to obtain entity objects matched with the industry sample data, wherein the entity objects are used for representing classification of industry knowledge.
S203, extracting a first sub-object related to the entity object from the data to be processed.
S204, constructing an industry knowledge graph according to the entity object and the first sub-object.
In an example of S201-S204, the plurality of types of data to be processed, including but not limited to words, sentences, paragraphs, chapters, tables, pictures, formulas, charts, etc. in the document data, may also be referred to as multi-granularity data to be processed, where "words", "sentences", "paragraphs", "chapters", "tables", "pictures", "formulas", "charts" are each a data type division granularity for the first document data. In other words, by parsing the first document data, namely: the multiple types of data to be processed can be obtained based on different data type division granularities. Matching the plurality of types of data to be processed with a small amount of industry sample data, for example, through named entity recognition (Named Entity Recognition, NER) processing, can result in an entity object that matches the industry sample data. And extracting a first sub-object (such as facets) related to the entity object from the multiple types of data to be processed, so as to quickly construct the industry knowledge graph according to the entity object and the first sub-object. Wherein, this facet refers to: the multidimensional attribute of things, such as a book containing subject, author, year, etc. facets; for example, the industrial knowledge is taken as an example, and the technological requirements and the technological processes can be equally divided.
By adopting the method and the device, the first document data is analyzed, various types of data to be processed can be obtained, the data to be processed is matched with the industry sample data, and the entity object matched with the industry sample data can be obtained and used for representing classification of industry knowledge. And extracting a first sub-object related to the entity object from the data to be processed so as to construct an industry knowledge graph according to the entity object and the first sub-object, thereby reducing the construction cost of the industry knowledge graph.
In one embodiment, the method further comprises: a first data structure for the query is constructed from the entity object, and a second data structure for the query is constructed from the first sub-object.
In an example, the first data structure may be a data structure (may be referred to as an entity context system) that is constructed based on the entity object and is used to describe an entity object context; in the case of the first sub-object taking the facet as an example, the second data structure may be a data structure built based on the facet (which may be referred to as a facet hierarchy).
By adopting the embodiment, various data structures for the query scene can be constructed according to the entity object and the first sub-object so as to more accurately adapt to the query scene and finally construct the required industry knowledge graph.
In one embodiment, constructing an industry knowledge graph according to the entity object and the first sub-object includes: and reorganizing the data structure of the first document data according to the first data structure and the second data structure to obtain second document data. And constructing an industry knowledge graph according to the second document data.
In an example, the first data structure may be a data structure (may be referred to as an entity context system) that is constructed based on the entity object and is used to describe an entity object context; in the case of the first sub-object taking the facet as an example, the second data structure may be a data structure constructed based on the facet (may be referred to as a facet system), and the industry knowledge graph may be constructed after the data is reorganized through multiple data structures.
By adopting the embodiment, various data structures for the query scene can be constructed according to the entity object and the first sub-object, and after the unordered first document data is reorganized through the various data structures, the obtained second document data is orderly and accords with various data formats, and the required industry knowledge graph constructed by adapting to the query scene is more accurate, so that the query result which accords with the query request can be quickly matched after the query request is received later.
In one embodiment, the method further comprises: and extracting a second sub-object from the data to be processed, wherein the second sub-object is a first semantic tag obtained through semantic analysis.
In some examples, in addition to the entity object, the first sub-object (e.g., facet) associated with the entity object, a second sub-object (e.g., a first semantic tag) may be included, so that a desired industry knowledge graph may be better constructed based on at least two of the entity object, the first sub-object, and the second sub-object. For the second sub-object, a third data structure for querying can be constructed, so that for the first document data, the data structure is reorganized according to the third data structure to obtain third document data, and an industry knowledge graph is constructed according to the third document data.
By adopting the implementation mode, the required industry knowledge graph can be better constructed according to at least two data structures in the entity object, the first sub-object and the second sub-object. And for the second sub-object, after reorganizing the unordered first document data through a third data structure, the obtained third document data is orderly and accords with a corresponding data format, and the required industry knowledge graph is constructed by adapting to the query scene, so that the query result which accords with the query request can be quickly matched after the query request is received later.
In one embodiment, the method further comprises: and carrying out semantic analysis on the acquired query request to obtain a second semantic tag described by the query request, and updating the industry knowledge graph according to the second semantic tag.
In some examples, the terminal-initiated query request (or information recommended for the terminal) also includes data useful for building an industry knowledge graph, e.g., in a query scenario may include analyzing a user query intent (or query requirement) of the query request, based on which a second semantic tag may be derived from which the industry knowledge graph may be updated.
By adopting the embodiment, the data which is useful for constructing the industry knowledge graph can be extracted from the document data, so that the problem that a large amount of industry sample data, a large amount of labeling data and the like are needed to be relied on in the related technology is avoided. The method not only can update the industry knowledge graph, but also can respond to the query request according to the industry knowledge graph to obtain the query result comprising the document content block in the document data, and further can locate the position of the document content block in the document data based on visual interaction according to the coordinates corresponding to the document content block in the document data.
In one embodiment, the method further comprises: and acquiring request data in the query request, and matching the request data and the multidimensional query index with an industry knowledge graph to obtain a multidimensional semantic block. And according to the multi-dimensional semantic blocks, aggregating a plurality of document content blocks in the document data corresponding to the request data to obtain an aggregation result. The aggregate result is displayed in the same document page. The multi-dimensional query index comprises at least two of an entity object, a first sub-object and a second sub-object.
In some examples, the request data may be: and inquiring parameters (such as influence parameters of the environment temperature on the transformer, and the like) related to the transformer in certain document data, matching the parameters (such as influence parameters of the environment temperature on the transformer, and the like) related to the transformer with the multi-dimensional query indexes according to the request data and the multi-dimensional query indexes to obtain multi-dimensional semantic blocks (the contents included in the semantic blocks are matched with the parameters related to the transformer, or have a superior-inferior relationship, and the like) matched with the multi-dimensional query indexes, and displaying an aggregation result obtained by aggregating the semantic blocks in the same document page.
According to the method and the device, in a query scene, request data in a query request are analyzed, the query request is responded according to the industry knowledge graph by combining the constructed industry knowledge graph, query content corresponding to the query request (such as an aggregation result obtained after the document content blocks in document data are aggregated) can be directly fed back, secondary jump is not needed due to the accuracy of query matching based on the industry knowledge graph, the query content corresponding to the query request can be directly obtained, the query request is directly met, the query precision is improved, and the query efficiency is improved.
Fig. 3 is a schematic diagram of a knowledge graph construction scenario in an application example according to an embodiment of the disclosure, as shown in fig. 3, including two aspects, a first aspect: obtaining a plurality of data structures at the server 300 side to construct an industry knowledge graph according to the plurality of data structures; in the second aspect, the server 300 side further updates the required industry knowledge graph through interaction with the terminal device 301 side (such as response to the query request in the query scene), and feeds back the query result (such as the aggregation result obtained by aggregating the document content blocks in the document data) for the query request. The specific process flow shown in fig. 3 includes:
S301, dividing the document content into a plurality of granularities, such as document chapters, document tables, document pictures, document formulas, document paragraphs, words, sentences in the document and the like, based on the original data facing the document content.
S302, based on a small amount of industry sample data, NER identification is carried out on the data entity, so as to construct an entity system 311 containing the upper and lower relationship of the entity.
S303, extracting facets related to the entity object from the document data, and extracting facet labels matched with the entity object to construct a facet system 312.
S304, extracting a first semantic tag in the document data to construct a semantic tag system 313.
S305, the server acquires a query request initiated by the terminal equipment.
S306, the server responds to the query request, semantic blocks are obtained based on semantic aggregation in the query request, and document content blocks 314 matched with the semantic blocks are fed back to the terminal equipment.
In some examples, parsing of various types of document data (e.g., pdf, word, excel, ppt, etc.) may be implemented by deep learning techniques to form multiple granularities of data to be processed (including words, sentences, paragraphs, chapters, tables, pictures, formulas, charts, etc. in the document data). The parsing process can be performed by a plurality of semantic fragment units corresponding to different granularities, respectively, so as to improve the processing efficiency. Entity object extraction is performed on the basis of NER, complex NER, general entity extraction model and the like from document data, for example, a small amount of industry sample data is combined to match the data to be processed with multiple granularities, so as to realize extraction of industry (or enterprise) entity objects, and thus an entity system 311 containing entity upper and lower relationships is constructed on the basis of industry rules or industry knowledge. Extracting facets related to the entity object from the document data, for example, extracting each facet label related to the entity object from a chapter, a chart, a text, extracting facet labels matched with the entity object based on a chapter hierarchy, a chapter-level relation extraction, and the like, so as to construct a facet system 312; for example, cells are extracted from the table, key-value pairs (KV) values or SPO values are extracted based on a rank structure (i.e., SPO tuples, triples, or other multi-tuple information obtained based on the table, etc.). A first semantic tag in the document data is extracted, such as constructing a semantic tag hierarchy 313 based on the generic concept graph (i.e., the mapping relationship that includes the first semantic tag).
In some examples, the server obtains the query request and then performs response processing, so that a semantic block can be obtained based on semantic aggregation in the query request, and finally the semantic block is fed back to the document content block 314 matched with the semantic block by the terminal device. The response processing may be performed by the index semantic unit, and may be performed in an online manner, for example, a second semantic tag may be obtained according to a user query intention described in the query request, so that the industry knowledge graph may be updated according to the second semantic tag, unlike the offline processing manner of the first aspect. The multi-dimensional query index (for example, the query index comprises at least two of an entity object, a first sub-object and a second sub-object) and the industry knowledge graph are matched according to the request data in the query request and the user query intention described by the request data to obtain multi-dimensional semantic blocks, so that a plurality of document content blocks in document data corresponding to the request data are aggregated according to the multi-dimensional semantic blocks to obtain an aggregation result. The aggregation result is displayed in the same document page, and considering that the semantic block is a complete unit meeting the requirements of business application scenes, taking a query scene as an example, the most complete document content block related to the query request can be directly fed back based on the aggregation processing of the semantic block, and the query requirements can be directly met under the condition of not making secondary jump text.
Optionally, the knowledge graph construction shown in fig. 3 is adopted, and multiple document content blocks in the document data can be associated based on multiple combination modes of multi-dimensional query indexes formed by the entity object, the first sub-object (such as a facet) and the second sub-object (such as a first semantic tag and a second semantic tag), so that not only can the industry knowledge graph comprising multi-dimensional content be obtained, but also the query request can be responded quickly, the query requirement can be met directly, and the query result matched with the query request can be fed back directly.
Fig. 4 is a schematic diagram of constructing a multi-layer business architecture based on parsing of document data in an application example according to an embodiment of the present disclosure, as shown in fig. 4, by performing multi-granularity division and parsing processing on fragmented content (documents, chapters, tables, paragraphs, pictures, etc.) of document data, multiple data formats are obtained, so that an industry knowledge graph is constructed based on multiple data formats (such as at least two of the first data format, the second data format, and the third data format). The industry knowledge graph can be further added into a business event graph and provided for each business core object (such as a business core object comprising document data processing such as inquiry, recommendation and the like) for document data processing by an upper-layer business application.
The data structure including the entity, the facet, etc. is illustrated in table 1, and as shown in table 1, the business knowledge graph is constructed by adopting multiple data formats (such as at least two of the first data format, the second data format and the third data format), the document data is reorganized, and the entity, the facet, the component, etc. are used for identifying, so that the unordered and semi-structured document data is converted into the ordered and structured document data, and the response precision and the response efficiency to the query request in the query scene are improved.
TABLE 1
The data structure including entities, facets, etc. is illustrated in table 2, the query of the document data can be implemented based on the data structure illustrated in table 2, and the document content block obtained by the query in the document data "110kv SF6 gas-insulated three-phase double-winding ac power transformer technical specification" is illustrated in table 3. As shown in tables 2-3, the query request may include the request data "110v power transformer", and the industry knowledge graph is matched according to the request data and the query intention of the user thereof, so as to obtain matched semantic blocks. After aggregation processing is carried out according to the semantic blocks, the method can directly jump to the corresponding position (such as the part of the '4.1.2 environmental temperature' displayed on the right side in the table 3) of the document content block in the technical specification book of the corresponding document data '110 kV SF6 gas-insulated three-phase double-winding alternating-current power transformer', so that the most complete document content block related to the query request can be directly fed back based on the aggregation processing of the semantic blocks, and the query requirement can be directly met under the condition of not carrying out secondary jump original text.
TABLE 2
TABLE 3 Table 3
Fig. 5 is a system architecture diagram for constructing a knowledge graph and a business application thereof in an application example according to an embodiment of the present disclosure, as shown in fig. 5, an industry knowledge graph construction module in a system architecture is used to implement a construction process of the industry knowledge graph, specifically, a structure analysis module may be used to obtain multi-granularity data to be processed in various document data, and a content understanding module may be used to perform various extraction processes, semantic analysis, and the like, so that the industry knowledge graph construction module may implement the construction process of the industry knowledge graph based on an output of the structure analysis module and an output of the content understanding module. Further, the online interaction can be performed with modules (one or more query modules, one or more recommendation modules, etc.) in each upper-layer business application to update the industry knowledge graph, and the accurate and efficient response processing (relevance ranking of queries, matching of query content and industry knowledge system, content and knowledge recommendation based on users or usage scenarios, etc.) can be performed on each upper-layer business application.
By adopting the application example, because the industry knowledge graph is not required to be constructed manually, industry experts are not required to construct and label industry data samples, the industry knowledge graph is constructed after various data formats are obtained through means of entity, facet, label extraction and the like based on the existing document data of the enterprise/industry, and in a short time, the industry knowledge graph conforming to the knowledge system of the enterprise/industry can be constructed completely without the complicated professional background knowledge of the industry experts, and the requirements of basic content knowledge retrieval, knowledge recommendation, knowledge question answering, knowledge carding and the like of each upper-layer business application are met more efficiently. In addition, through the fragmentation processing and the aggregation processing of the document data, semantic blocks capable of directly meeting the requirements are formed, document content blocks corresponding to the semantic blocks are directly fed back, the positions of the document content blocks are positioned, excessive skipping of original text is avoided, and the fine granularity query and positioning of massive semi-structured document contents are realized.
According to an embodiment of the present disclosure, there is provided a knowledge graph construction apparatus, fig. 6 is a schematic diagram of a composition structure of the knowledge graph construction apparatus according to an embodiment of the present disclosure, as shown in fig. 6, the knowledge graph construction apparatus including: the parsing module 601 is configured to parse the first document data to obtain multiple types of data to be processed; the matching module 602 is configured to match the data to be processed with industry sample data to obtain an entity object matched with the industry sample data, where the entity object is used for characterizing classification of industry knowledge; a first extracting module 603, configured to extract a first sub-object related to the entity object from the data to be processed; a first construction module 604, configured to construct an industry knowledge graph according to the entity object and the first sub-object.
In one embodiment, the method further comprises: the second construction module is used for constructing a first data structure for inquiring according to the entity object; and constructing a second data structure for querying according to the first sub-object.
In one embodiment, the first construction module 604 is configured to reorganize the data structure of the first document data according to the first data structure and the second data structure to obtain second document data; and constructing the industry knowledge graph according to the second document data.
In one embodiment, the method further comprises: the second extraction module is used for extracting a second sub-object from the data to be processed, wherein the second sub-object is a first semantic tag obtained through semantic analysis.
In one embodiment, the method further comprises: a third construction module, configured to construct a third data structure for query according to the second sub-object; reorganizing the data structure of the first document data according to the third data structure to obtain third document data; and constructing the industry knowledge graph according to the third document data.
In one embodiment, the method further comprises: the updating module is used for carrying out semantic analysis on the acquired query request to obtain a second semantic tag described by the query request; and updating the industry knowledge graph according to the second semantic label.
In one embodiment, the method further comprises: the aggregation module is used for acquiring request data in the query request; matching the request data and the multidimensional query index with the industry knowledge graph to obtain a multidimensional semantic block; according to the multi-dimensional semantic blocks, aggregating a plurality of document content blocks in document data corresponding to the request data to obtain an aggregation result; displaying the aggregation result in the same document page; the multi-dimensional query index comprises at least two of the entity object, the first sub-object and the second sub-object.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, a knowledge graph construction method. For example, in some embodiments, the knowledge graph construction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the knowledge graph construction method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the knowledge-graph construction method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (12)

1. A knowledge graph construction method comprises the following steps:
analyzing the first document data to obtain various types of data to be processed;
matching the data to be processed with industry sample data to obtain entity objects matched with the industry sample data, wherein the entity objects are used for representing classification of industry knowledge;
extracting a first sub-object related to the entity object from the data to be processed;
Constructing an industry knowledge graph according to the entity object and the first sub-object,
the method further comprises the steps of:
extracting a second sub-object from the data to be processed, wherein the second sub-object is a first semantic tag obtained through semantic analysis;
acquiring request data in a query request;
matching the request data and the multidimensional query index with the industry knowledge graph to obtain a multidimensional semantic block;
according to the multi-dimensional semantic blocks, aggregating a plurality of document content blocks in document data corresponding to the request data to obtain an aggregation result;
displaying the aggregation result in the same document page;
the multi-dimensional query index comprises at least two of the entity object, the first sub-object and the second sub-object.
2. The method of claim 1, further comprising:
constructing a first data structure for inquiring according to the entity object;
and constructing a second data structure for querying according to the first sub-object.
3. The method of claim 2, wherein the constructing an industry knowledge graph from the entity object and the first sub-object comprises:
Reorganizing the data structure of the first document data according to the first data structure and the second data structure to obtain second document data;
and constructing the industry knowledge graph according to the second document data.
4. The method of claim 1, further comprising:
constructing a third data structure for querying according to the second sub-object;
reorganizing the data structure of the first document data according to the third data structure to obtain third document data;
and constructing the industry knowledge graph according to the third document data.
5. The method of claim 1, further comprising:
carrying out semantic analysis on the acquired query request to obtain a second semantic tag described by the query request;
and updating the industry knowledge graph according to the second semantic label.
6. A knowledge graph construction apparatus comprising:
the analysis module is used for analyzing the first document data to obtain various types of data to be processed;
the matching module is used for matching the data to be processed with industry sample data to obtain entity objects matched with the industry sample data, and the entity objects are used for representing classification of industry knowledge;
The first extraction module is used for extracting a first sub-object related to the entity object from the data to be processed;
the first construction module is used for constructing an industry knowledge graph according to the entity object and the first sub-object;
the second extraction module is used for extracting a second sub-object from the data to be processed, wherein the second sub-object is a first semantic tag obtained through semantic analysis;
an aggregation module for:
acquiring request data in a query request;
matching the request data and the multidimensional query index with the industry knowledge graph to obtain a multidimensional semantic block;
according to the multi-dimensional semantic blocks, aggregating a plurality of document content blocks in document data corresponding to the request data to obtain an aggregation result;
displaying the aggregation result in the same document page;
the multi-dimensional query index comprises at least two of the entity object, the first sub-object and the second sub-object.
7. The apparatus of claim 6, further comprising a second build module to:
constructing a first data structure for inquiring according to the entity object;
And constructing a second data structure for querying according to the first sub-object.
8. The apparatus of claim 7, wherein the first building block is to:
reorganizing the data structure of the first document data according to the first data structure and the second data structure to obtain second document data;
and constructing the industry knowledge graph according to the second document data.
9. The apparatus of claim 6, further comprising a third building block to:
constructing a third data structure for querying according to the second sub-object;
reorganizing the data structure of the first document data according to the third data structure to obtain third document data;
and constructing the industry knowledge graph according to the third document data.
10. The apparatus of claim 6, further comprising an update module to:
carrying out semantic analysis on the acquired query request to obtain a second semantic tag described by the query request;
and updating the industry knowledge graph according to the second semantic label.
11. An electronic device, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202210604131.5A 2022-05-30 2022-05-30 Knowledge graph construction method and device, electronic equipment and storage medium Active CN115203428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210604131.5A CN115203428B (en) 2022-05-30 2022-05-30 Knowledge graph construction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210604131.5A CN115203428B (en) 2022-05-30 2022-05-30 Knowledge graph construction method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115203428A CN115203428A (en) 2022-10-18
CN115203428B true CN115203428B (en) 2023-09-26

Family

ID=83576770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210604131.5A Active CN115203428B (en) 2022-05-30 2022-05-30 Knowledge graph construction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115203428B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633044A (en) * 2017-09-14 2018-01-26 国家计算机网络与信息安全管理中心 A kind of public sentiment knowledge mapping construction method based on focus incident
CN113656590A (en) * 2021-07-16 2021-11-16 北京百度网讯科技有限公司 Industry map construction method and device, electronic equipment and storage medium
CN113836314A (en) * 2021-09-18 2021-12-24 北京百度网讯科技有限公司 Knowledge graph construction method, device, equipment and storage medium
CN114495143A (en) * 2021-12-24 2022-05-13 北京百度网讯科技有限公司 Text object identification method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522967B (en) * 2020-04-27 2023-09-15 北京百度网讯科技有限公司 Knowledge graph construction method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633044A (en) * 2017-09-14 2018-01-26 国家计算机网络与信息安全管理中心 A kind of public sentiment knowledge mapping construction method based on focus incident
CN113656590A (en) * 2021-07-16 2021-11-16 北京百度网讯科技有限公司 Industry map construction method and device, electronic equipment and storage medium
CN113836314A (en) * 2021-09-18 2021-12-24 北京百度网讯科技有限公司 Knowledge graph construction method, device, equipment and storage medium
CN114495143A (en) * 2021-12-24 2022-05-13 北京百度网讯科技有限公司 Text object identification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115203428A (en) 2022-10-18

Similar Documents

Publication Publication Date Title
CN111797210A (en) Information recommendation method, device and equipment based on user portrait and storage medium
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN113326420B (en) Question retrieval method, device, electronic equipment and medium
CN114495143B (en) Text object recognition method and device, electronic equipment and storage medium
CN113239295A (en) Search method, search device, electronic equipment and storage medium
CN112559631A (en) Data processing method and device of distributed graph database and electronic equipment
CN113836316B (en) Processing method, training method, device, equipment and medium for ternary group data
CN113609847B (en) Information extraction method, device, electronic equipment and storage medium
CN113268560A (en) Method and device for text matching
CN112506864B (en) File retrieval method, device, electronic equipment and readable storage medium
CN113609100A (en) Data storage method, data query method, data storage device, data query device and electronic equipment
CN116597443A (en) Material tag processing method and device, electronic equipment and medium
CN115203428B (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN115544010A (en) Mapping relation determining method and device, electronic equipment and storage medium
CN114969371A (en) Heat sorting method and device of combined knowledge graph
CN114417862A (en) Text matching method, and training method and device of text matching model
CN114218431A (en) Video searching method and device, electronic equipment and storage medium
CN116069914B (en) Training data generation method, model training method and device
CN114610845B (en) Intelligent question-answering method, device and equipment based on multiple systems
CN116610782B (en) Text retrieval method, device, electronic equipment and medium
CN115186163B (en) Training of search result ranking model and search result ranking method and device
CN115658903B (en) Text classification method, model training method, related device and electronic equipment
CN116542244A (en) Entity disambiguation method and device for power industry
CN115828915A (en) Entity disambiguation method, apparatus, electronic device and storage medium
CN116431764A (en) Data matching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant