CN114443854A - Processing method and device of multi-source heterogeneous data, computer equipment and storage medium - Google Patents

Processing method and device of multi-source heterogeneous data, computer equipment and storage medium Download PDF

Info

Publication number
CN114443854A
CN114443854A CN202111646837.XA CN202111646837A CN114443854A CN 114443854 A CN114443854 A CN 114443854A CN 202111646837 A CN202111646837 A CN 202111646837A CN 114443854 A CN114443854 A CN 114443854A
Authority
CN
China
Prior art keywords
service
data
service domain
domain
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111646837.XA
Other languages
Chinese (zh)
Inventor
谈樑
李柄坤
朱和胜
康晓琦
刘阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jingtai Technology Co Ltd
Original Assignee
Shenzhen Jingtai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jingtai Technology Co Ltd filed Critical Shenzhen Jingtai Technology Co Ltd
Priority to CN202111646837.XA priority Critical patent/CN114443854A/en
Publication of CN114443854A publication Critical patent/CN114443854A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of knowledge graphs, in particular to a method and a device for processing multi-source heterogeneous data, computer equipment and a storage medium. The method comprises the steps of obtaining service data from a trusted source system; the trusted source system comprises service systems used by terminals of all service domains; constructing a knowledge graph according to the service data; the knowledge graph can indicate the relation between service data of different service domains and service domain indexes; acquiring a resource entity of a service domain to be fused and a relation of the resource entity according to the knowledge graph to construct a service domain knowledge graph library; and performing service domain information synchronization, fusion and sharing operation of cross-service domain and cross-resource entity based on the service domain knowledge graph library. The invention can efficiently extract the information of different organizations inside and outside and across business fields, and quickly integrate the information into business knowledge in an iterative manner.

Description

Processing method and device of multi-source heterogeneous data, computer equipment and storage medium
Technical Field
The invention relates to the technical field of knowledge graph, in particular to a processing method and device of multi-source heterogeneous data, computer equipment and a storage medium.
Background
With the rapid development and application of the AI technology, the technical route of the scientific and technological enterprises is changed at an accelerated speed, and the scientific and technological company with the dominant technology rapidly generates the siphon effect to the upstream and downstream of the industrial chain through a single service line and a single pivot of an application scene. Under the background, upstream and downstream knowledge information is aggregated and exploded, an IT product line needs to be rapidly expanded in a short period, and the application scene of the IT technology is also rapidly expanded, so that requirements of high efficiency, decision assistance and the like are provided for knowledge acquisition, fusion and analysis in different fields.
With the rapid expansion of service lines, information system integration and information sharing of internal and external organizations are often achieved in each development stage. In the business process scenario of a complex industrial chain, the traditional ERP is the most mature practice scheme. ERP often requires large-scale procurement, deployment, training, and high-cost user habits and administrative migration costs. Under the scene of a rapidly revolutionary business process, users of the existing system can be rapidly merged into ERP, and can play a role as soon as possible after business lines are combined, so that great difficulty is brought. Although the SaaS can achieve a certain ERP removing effect and improve the flexibility of the service, the SaaS is also deeply bound with organizations and user habits, and different SaaS manufacturers change respective data into an isolated island in the same enterprise, thereby generating the same high migration cost problem.
In long-term research and practice on the prior art, the inventor of the present invention finds that research and application under the prior art, such as ERP, SaaS or others, are only concerned about how to efficiently extract incremental information of enterprises for internal and external organizations and across business fields in the scenario of rapid development of the enterprises, and how to rapidly integrate the incremental information into business knowledge in an iterative manner. Therefore, the prior art lacks an effective and efficient means for extracting, sorting, fusing and converting cross-domain data in management of multi-element heterogeneous enterprise information into reference indexes of operation decision.
Disclosure of Invention
Based on the problems and disadvantages in the prior art, the invention provides a method and a device for processing multi-source heterogeneous data, computer equipment and a storage medium, which can efficiently extract information of different organizations inside and outside and across business fields and quickly integrate the information into business knowledge in an iterative manner.
An embodiment of the present application provides a method for processing multi-source heterogeneous data, including:
acquiring service data from a trusted source system; the trusted source system comprises service systems used by terminals of all service domains;
constructing a knowledge graph according to the service data; the knowledge graph can indicate the relation between service data of different service domains and service domain indexes;
acquiring a resource entity of a service domain to be fused and a relation of the resource entity according to the knowledge graph to construct a service domain knowledge graph library;
and performing service domain information synchronization, fusion and sharing operation of cross-service domain and cross-resource entity based on the service domain knowledge map library.
Optionally, after the service domain information synchronization, fusion, and sharing operations across service domains and across resource entities are performed based on the service domain knowledge graph library, the method further includes:
according to the service domain indexes and the credible source indexes corresponding to the service domains which are configured in advance, the quality of the service data of each service domain is verified; wherein the content of the first and second substances,
the service domain index and the trusted source index are both configured as user-defined indexes for considering service data of different service domains, and the trusted source index is used as a reference standard for comparing the service domain indexes.
Optionally, the constructing a knowledge graph according to the service data includes:
extracting metadata of the service data to establish standardized graph data; wherein the standardized graph data comprises metadata entities and entity relationships;
and constructing a service domain knowledge graph according to the standardized graph data.
Optionally, the extracting metadata from the service data to establish normalized graph data includes:
when the service data is the existing service data, extracting metadata of the existing service data according to a service domain index model to establish standardized graph data;
wherein the service domain indicator model is configured with service domain indicators associated with respective service domains, each of the service domain indicators being assigned a respective weight.
Optionally, the extracting metadata from the service data to establish normalized graph data further includes:
when the business data is incremental business data, if the incremental business data belongs to a brand new business domain, metadata extraction is carried out on the incremental business data so as to establish new standardized graph data;
if the incremental business data belong to the existing business domain, extracting metadata of the incremental business data; and are combined
And verifying the metadata according to the service domain index and the trusted source index corresponding to the incremental service data, and establishing new standardized graph data or incrementally combining the existing standardized graph data according to the verification result.
Optionally, after the extracting the metadata from the service data, the method further includes:
and performing preset data format conversion and persistence on the metadata.
Optionally, the resource entity includes service system software information, embedded software information, and hardware device information.
Based on the same inventive concept, an embodiment of the present application further provides a device for processing multi-source heterogeneous data, including:
the acquisition module is used for acquiring service data from the trusted source system; the trusted source system comprises service systems used by terminals of all service domains;
the first construction module is used for constructing a knowledge graph according to the service data; the knowledge graph can indicate the relation between service data of different service domains and service domain indexes;
the second construction module is used for acquiring the resource entities of the service domain to be fused and the relationship of the resource entities according to the knowledge graph so as to construct a service domain knowledge graph library;
and the fusion module is used for executing the operation of synchronizing, fusing and sharing the service domain information of the cross-service domain and the cross-resource entity based on the service domain knowledge map library.
Optionally, the processing apparatus for multi-source heterogeneous data further includes:
the evaluation module is used for verifying the quality of the service data of each service domain according to the service domain indexes and the credible source indexes corresponding to each service domain which are configured in advance; wherein the content of the first and second substances,
the service domain index and the trusted source index are both configured as user-defined indexes for considering service data of different service domains, and the trusted source index is used as a reference standard for comparing the service domain indexes.
Based on the same inventive concept, an embodiment of the present application further provides a computer device, including: a processor, a memory and a computer program stored on the memory, the processor being coupled to the memory, the processor in operation executing the computer program to implement the method of processing multi-source heterogeneous data as described above.
Based on the same inventive concept, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and when the computer instructions are executed by a computer, the computer executes the instructions of the above-mentioned multi-source heterogeneous data processing method.
One of the above technical solutions has the following advantages and beneficial effects:
according to the embodiments of the application, the knowledge map corresponding to the service data acquired from the trusted source system is established through a knowledge extraction technology. Further, according to the knowledge graph, acquiring a resource entity of the service domain to be fused and a relation of the resource entity to construct a service domain knowledge graph library. And performing service domain information synchronization, fusion and sharing operation of cross-service domain and cross-resource entity based on the service domain knowledge map library. Based on the method and the device, the cross-domain data can be efficiently extracted, sorted, fused and converted into the reference index of the operation decision in the rapid iteration and integration process of the business process of the enterprise.
Drawings
The present application will now be described with reference to the accompanying drawings. The drawings in the present application are for the purpose of illustrating embodiments only. Other embodiments can be readily made by those skilled in the art from the following description of the steps described without departing from the principles of the present application.
FIG. 1 is a schematic structural diagram of a system for managing multi-source heterogeneous data according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating a method for processing multi-source heterogeneous data according to an embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating a method for processing multi-source heterogeneous data according to an embodiment of the present application;
FIG. 4 is a schematic illustration of a core value process map in an embodiment of the present application;
FIG. 5 is a diagram illustrating an incremental service domain in one embodiment of the present application;
FIG. 6 is a diagram illustrating a process for building a domain-of-business based knowledge graph in an embodiment of the present application;
FIG. 7 is a schematic flow chart illustrating a method for processing multi-source heterogeneous data according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a diagram of a trusted source indicator based configuration unit according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of an apparatus for processing multi-source heterogeneous data according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a device for processing multi-source heterogeneous data according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The graph is one of the strongest frameworks in data structures and algorithms, which is an abstract network of vertices and edges. In a specific scene, through reasonably defining and describing the vertexes and the edges, a semantic network of the relationship among various abstract entities in the guest world can be constructed.
Knowledge-graph is the most widely used based on graph theory, and the conventional knowledge-graph is logically divided into a mode layer and a data layer, and the data layer expresses facts by a series of (entity, relation, entity) triples. The schema layer defines the description rules for the fact. The establishment of a set of knowledge maps is generally completed by knowledge expression, knowledge extraction and knowledge fusion. In the field of multi-source heterogeneous enterprise information management, the main focus of the research of knowledge maps and graph theory is enterprise relationship, namely, the correlation information among enterprises is extracted through the crawling and sorting of public information, and analysis reference is provided for the best reconciliation and supervision. In the field of enterprise internal management, research focuses more on knowledge management, and knowledge of flow process, culture and the like of mature enterprises is subjected to map construction, so that the efficiency of data query and flow execution is improved.
However, the past research and application pay little attention to the extraction and learning of incremental information of enterprises from internal and external organizations and across business fields under the scene of rapid development of the enterprises, and the requirement of the efficiency of rapid and iterative integration into business knowledge is far ahead of the enterprise information integration, and the knowledge graph can play a greater value in the field.
Based on this, in one embodiment of the invention, a management system for multi-source heterogeneous data is provided.
As shown in fig. 1, the management system of multi-source heterogeneous data includes a centralized business domain value index design and application module, a distributed multi-source knowledge extraction architecture and a multi-source business domain map evaluation module, which form a whole set of analysis architecture with an existing system for providing metadata in the whole IT service and a new system introduced potentially.
The service domain value index design and application module comprises a service domain index and model design unit, a data description rule design unit, a service domain knowledge map visualization unit, a service domain index change log unit and an intelligent decision report engine.
The business domain index and model design unit provides a visual index design tool for a user. Users can define, drag and associate various indexes and form a business domain index model of a graph structure. The designed service domain index model can guide an intelligent decision report engine to extract and analyze data stored in the distributed multi-source knowledge extraction architecture to form a readable report.
The data description rule design unit is used for unifying index languages, and is convenient for users to understand and analyze the system.
The business domain knowledge map visualization unit macroscopically displays the data and index association relation of different business domains, and a user can conveniently and quickly learn business domain knowledge.
The service domain index change log unit records the change of the service domain index model, and a user can conveniently trace back past versions.
The distributed multi-source knowledge extraction framework comprises a trusted source system data labeling and extraction unit, a trusted source system management unit, data synchronization and collection service and distributed data storage management.
The data labeling and extracting unit of the trusted source system is mainly used for extracting and processing data from the trusted source system, and the processing function comprises the steps of extracting metadata and uniformly converting the metadata into a data format defined in the design of a data description rule.
And the trusted source system manages metadata source systems which can be accessed to the distributed multi-source knowledge extraction framework, and determines which systems can be incorporated into the framework and whether the systems have the data synchronization authority.
The data synchronization and collection service unit regularly backs up and cleans distributed data to ensure the stability and reliability of the microservice.
The distributed data storage management unit persists the data in the uniform format so as to ensure the analysis requirement of the upper-layer application on the data.
The multi-element service domain evaluation module comprises a trusted source index configuration unit and a service domain index comparison and evaluation unit.
The credible source index configuration unit allows a user to carry out index design on systems in different fields and serves as a reference standard of the business domain index comparison and evaluation unit.
When a new map is created, the service domain index comparison and evaluation unit compares the new map with a reference map stored in the trusted source index configuration unit and outputs evaluation to the user.
The trusted source system, which is a service system used by the terminal of each service domain, includes, but is not limited to, a project management system, a sales management system, a laboratory LIMS, a supply chain management system, a human resource system, and other service systems. It will be appreciated that the trusted source system supports the daily work of the end user and stores a large amount of islanding data.
In one embodiment, there are not only differences in data structure but also differences in regional and network configurations due to the data sources of different IT systems that integrate the service lines. Therefore, the distributed multi-source knowledge extraction architecture is a centralized management and distributed service deployment mode, and needs to perform corresponding configuration management on different data source systems.
Based on the distributed multi-source knowledge extraction architecture, distributed data storage and management functions are designed for the trusted source system so as to divide and conquer related system configuration and trace query metadata.
The definition of the trusted source system is a system which has authority authentication of the whole IT system and can carry out data docking. The configuration and the empowerment are carried out through a centralized management background of a multi-source knowledge extraction framework, and working procedures such as data extraction, labeling, cleaning and the like are realized by matching with multi-source knowledge extraction services deployed in different environments.
The distributed multi-source knowledge extraction service provides functions of metadata query, retrieval and the like, and provides an interface for services of a related application layer to analyze or trace data.
The embodiment of the embodiment fully combines knowledge graph technology, applies the successful operation indexes and experience systems of the service domains accumulated in the rapid development process, can construct a service domain knowledge graph library through knowledge extraction technology, imports the entity and entity relationship of the service domain managed by the core, executes data mining of cross-service domain and service line, and realizes rapid integration and visual analysis and display of index data of the multi-element heterogeneous service line.
As shown in fig. 2, based on the foregoing embodiments, an embodiment of the present application provides a method for processing multi-source heterogeneous data, including steps S100 to S400.
Step S100: and acquiring service data from the trusted source system. Wherein the trusted source system comprises service systems used by terminals of respective service domains.
Step S200: and constructing a knowledge graph according to the service data. The knowledge graph can indicate the relation between the service data of different service domains and the service domain indexes.
As shown in fig. 3, step S200 includes:
step S211: and extracting metadata of the service data to establish standardized graph data. Wherein the standardized graph data includes metadata entities and entity relationships.
It can be understood that after the metadata extraction is performed on the service data, the metadata needs to be subjected to preset data format conversion and persistence.
Step S212: and constructing a service domain knowledge graph according to the standardized graph data.
In one embodiment, when the service data is existing service data, metadata extraction is performed on the existing service data according to a service domain index model to establish standardized graph data.
Wherein the service domain indicator model is configured with service domain indicators associated with respective service domains, each of the service domain indicators being assigned a respective weight.
Around a specific service line fusion mode, based on the service domain knowledge graph visualization unit in fig. 1, an index hierarchy required by decision and a consideration index of an associated service domain can be quickly constructed. For example:
when the efficiency index model is constructed during service line integration, the name, version and efficiency related indexes of each associated service field of the model can be defined in a man-machine interface designed by the service field index and the model in a mode of manual definition or decision system introduction, and different weights are given, so that a triple of operation index-weight-service field index is formed. And further, guiding the distributed multi-source knowledge extraction framework to extract the service domain metadata, and guiding the extraction mode and the rule of a system in which the metadata exists to design.
Specifically, in the business domain indexes, data collected with the distributed multi-source knowledge extraction architecture are presented in a visual map panel for a user of the analysis system to drag, associate and formulate, and finally a set of business line business index model based on a map is formed, as shown in fig. 4.
In one embodiment, when the service data is incremental service data, if the incremental service data belongs to a brand new service domain, metadata extraction is performed on the incremental service data to establish new standardized graph data. And if the incremental business data belong to the existing business domain, extracting the metadata of the incremental business data.
And verifying the metadata according to the service domain index and the trusted source index corresponding to the incremental service data.
And establishing new standardized graph data or incrementally combining the existing standardized graph data according to the verification result.
As shown in fig. 5, fig. 5 is a diagram illustrating an incremental service domain. When the service line starts to be integrated, data is extracted and collected through a multi-source extraction framework aiming at an information system introduced by increment. The first step is to construct a knowledge graph of the incremental information system, and if the knowledge graph is a brand-new field, a new graph needs to be separately constructed. If the method is an existing field, data can be extracted and compared firstly to form specific evaluation of a multi-element service domain, and then a multi-source information extraction framework determines whether to reconstruct the map or incrementally combine the map.
As shown in fig. 6, the process of building a knowledge graph based on business areas can be exemplified as follows. After the standardized graph data is constructed, a knowledge graph can be formed based on the corresponding service domain index model. Data cleaning and calculation (for example, calculation is performed on stored data based on specific design in a business domain index model, index or metadata entity associated with the business domain index entity, and connection relation weight of triples, and finally a report is generated on a user interface) are performed through the intelligent decision report engine in fig. 1, so that contribution of the incremental business domain to the overall operation at the current business line to be integrated can be obtained. A user can know which specific system index influences the whole index or generates forward income by observing and analyzing the index and using a query tool and a map tracing tool.
Step S300: and acquiring the resource entities of the service domain to be fused and the relationship of the resource entities according to the knowledge graph to construct a service domain knowledge graph library.
In one embodiment, the resource entities include business system software information, embedded software information, and hardware device information.
Step S400: and performing service domain information synchronization, fusion and sharing operation of cross-service domain and cross-resource entity based on the service domain knowledge map library.
As shown in fig. 7, after step S400, step S500 is further included.
Step S500: and verifying the quality of the service data of each service domain according to the service domain indexes corresponding to each service domain and the credible source indexes which are configured in advance.
The service domain index and the trusted source index are both configured as user-defined indexes for considering service data of different service domains, and the trusted source index is used as a reference standard for comparing the service domain indexes.
As shown in fig. 8, fig. 8 is a graph principle of a unit configured based on a trusted source indicator. The information structure in a trusted source system often represents a mature methodology of the business field, so that a quantitative evaluation method can be constructed around the trusted source system and the industry where the trusted source system is located, and the quantitative evaluation method is also a knowledge graph based on graph data. In an incremental data source newly introduced into an analysis system, an integrated service domain index model is constructed, and evaluation scores of the integrated service domain index model are considered, so that a decision maker can judge the rationality of the service domain index model.
Based on the embodiment, the method and the device can be used for efficiently extracting, sorting and fusing the cross-domain data and converting the cross-domain data into the reference index of the operation decision in the process of fast iteration and integration of the business process of the enterprise.
As shown in fig. 9, based on the same inventive concept, an embodiment of the present application further provides an apparatus for processing multi-source heterogeneous data, including:
and the obtaining module 10 is configured to obtain service data from the trusted source system. Wherein the trusted source system comprises service systems used by terminals of respective service domains.
A first constructing module 20, configured to construct a knowledge graph according to the service data. The knowledge graph can indicate the relation between the service data of different service domains and the service domain indexes.
The first building module 20 is configured to perform metadata extraction on the service data to build standardized graph data, and build a service domain knowledge graph according to the standardized graph data. Wherein the standardized graph data includes metadata entities and entity relationships.
It can be understood that after the metadata extraction is performed on the service data, the metadata needs to be subjected to preset data format conversion and persistence.
In an embodiment, the first building module 20 is further configured to, when the service data is existing service data, perform metadata extraction on the existing service data according to a service domain index model to build standardized graph data. Wherein the service domain indicator model is configured with service domain indicators associated with respective service domains, each of the service domain indicators being assigned a respective weight.
Around a specific service line fusion mode, based on the service domain knowledge graph visualization unit in fig. 1, an index hierarchy required by decision and a consideration index of an associated service domain can be quickly constructed. For example:
when the efficiency index model is constructed during service line integration, the name, version and efficiency related indexes of each associated service field of the model can be defined in a man-machine interface designed by the service field index and the model in a mode of manual definition or decision system introduction, and different weights are given, so that a triple of operation index-weight-service field index is formed. And further, guiding the distributed multi-source knowledge extraction framework to extract the service domain metadata, and guiding the extraction mode and the rule of a system in which the metadata exists to design.
Specifically, in the business domain indexes, data collected with the distributed multi-source knowledge extraction architecture are presented in a visual map panel for a user of the analysis system to drag, associate and formulate, and finally a set of business line business index model based on a map is formed, as shown in fig. 4.
In one embodiment, the first building module 20 is further configured to:
when the business data is incremental business data, if the incremental business data belongs to a brand-new business domain, metadata extraction is carried out on the incremental business data to establish new standardized graph data;
if the incremental business data belong to the existing business domain, extracting metadata of the incremental business data;
and verifying the metadata according to the service domain index and the trusted source index corresponding to the incremental service data.
And establishing new standardized graph data or incrementally combining the existing standardized graph data according to the verification result.
As shown in fig. 5, fig. 5 is a diagram illustrating an incremental service domain. When the service line starts to be integrated, data is extracted and collected through a multi-source extraction framework aiming at an information system introduced by increment. The first step is to construct a knowledge graph of the incremental information system, and if the knowledge graph is a brand-new field, a new graph needs to be separately constructed. If the method is an existing field, data can be extracted and compared firstly to form specific evaluation of a multi-element service domain, and then a multi-source information extraction framework determines whether to reconstruct the map or incrementally combine the map.
As shown in fig. 6, the process of building a knowledge graph based on business areas can be exemplified as follows. After the standardized graph data is constructed, a knowledge graph can be formed based on the corresponding service domain index model. Data cleaning and calculation (for example, calculation is performed on stored data based on specific design in a business domain index model, index or metadata entity associated with the business domain index entity, and connection relation weight of triples, and finally a report is generated on a user interface) are performed through the intelligent decision report engine in fig. 1, so that contribution of the incremental business domain to the overall operation at the current business line to be integrated can be obtained. A user can know which specific system index influences the whole index or generates forward income by observing and analyzing the index and using a query tool and a map tracing tool.
And a second construction module 30, configured to obtain, according to the knowledge graph, a resource entity of the service domain to be fused and a relationship between the resource entities, so as to construct a service domain knowledge graph library.
In one embodiment, the resource entities include business system software information, embedded software information, and hardware device information.
And the fusion module 40 is used for performing service domain information synchronization, fusion and sharing operations of cross-service domain and cross-resource entity based on the service domain knowledge graph library.
As shown in fig. 10, the apparatus for processing multi-source heterogeneous data further includes:
the evaluation module 50 is configured to verify the quality of the service data of each service domain according to a service domain index and a trusted source index corresponding to each service domain configured in advance; wherein the content of the first and second substances,
the service domain index and the trusted source index are both configured as user-defined indexes for considering service data of different service domains, and the trusted source index is used as a reference standard for comparing the service domain indexes.
As shown in fig. 7, fig. 7 is a graph principle of the configuration unit based on the trusted source indicator. The information structure in a trusted source system often represents a mature methodology of the business field, so that a quantitative evaluation method can be constructed around the trusted source system and the industry where the trusted source system is located, and the quantitative evaluation method is also a knowledge graph based on graph data. In an incremental data source newly introduced into an analysis system, an integrated service domain index model is constructed, and evaluation scores of the integrated service domain index model are examined so that a decision maker can judge the rationality of the service domain index model.
One embodiment of the present application provides a computer device, comprising: a processor, a memory and a computer program stored on the memory, the processor being coupled to the memory, the processor in operation executing the computer program to implement the method of processing multi-source heterogeneous data as described above.
An embodiment of the present application provides a computer-readable storage medium, which stores computer instructions, and when the computer instructions are executed by a computer, the computer executes the instructions of the processing method of multi-source heterogeneous data as described above.
In the above embodiments, the implementation may be wholly or partly realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above-mentioned embodiments are provided not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (11)

1. A multi-source heterogeneous data processing method is characterized by comprising the following steps:
acquiring service data from a trusted source system; the trusted source system comprises service systems used by terminals of all service domains;
constructing a knowledge graph according to the service data; the knowledge graph can indicate the relation between service data of different service domains and service domain indexes;
acquiring a resource entity of a service domain to be fused and a relation of the resource entity according to the knowledge graph to construct a service domain knowledge graph library;
and performing service domain information synchronization, fusion and sharing operation of cross-service domain and cross-resource entity based on the service domain knowledge map library.
2. The method for processing multi-source heterogeneous data according to claim 1, wherein after the performing the operations of service domain information synchronization, fusion and sharing across service domains and across resource entities based on the service domain knowledge graph library, further comprises:
according to the service domain indexes and the credible source indexes corresponding to the service domains which are configured in advance, the quality of the service data of each service domain is verified; wherein the content of the first and second substances,
the service domain index and the trusted source index are both configured as user-defined indexes for considering service data of different service domains, and the trusted source index is used as a reference standard for comparing the service domain indexes.
3. The method for processing multi-source heterogeneous data according to claim 1, wherein the constructing a knowledge graph according to the business data comprises:
extracting metadata of the service data to establish standardized graph data; wherein the standardized graph data comprises metadata entities and entity relationships;
and constructing a service domain knowledge graph according to the standardized graph data.
4. The method for processing multi-source heterogeneous data according to claim 3, wherein the performing metadata extraction on the business data to establish normalized graph data comprises:
when the service data is the existing service data, extracting metadata of the existing service data according to a service domain index model to establish standardized graph data;
wherein the service domain indicator model is configured with service domain indicators associated with respective service domains, each of the service domain indicators being assigned a respective weight.
5. The method for processing multi-source heterogeneous data according to claim 4, wherein the performing metadata extraction on the business data to create normalized graph data further comprises:
when the business data is incremental business data, if the incremental business data belongs to a brand new business domain, metadata extraction is carried out on the incremental business data so as to establish new standardized graph data;
if the incremental business data belong to the existing business domain, extracting metadata of the incremental business data; and are
And verifying the metadata according to the service domain index and the trusted source index corresponding to the incremental service data, and establishing new standardized graph data or incrementally combining the existing standardized graph data according to the verification result.
6. The method for processing multi-source heterogeneous data according to any one of claims 3 to 5, wherein after the extracting metadata from the business data, the method further comprises:
and performing preset data format conversion and persistence on the metadata.
7. The method for processing the multi-source heterogeneous data according to claim 1, wherein the resource entities comprise business system software information, embedded software information, and hardware device information.
8. A device for processing multi-source heterogeneous data, comprising:
the acquisition module is used for acquiring service data from the trusted source system; the trusted source system comprises service systems used by terminals of all service domains;
the first construction module is used for constructing a knowledge graph according to the service data; the knowledge graph can indicate the relation between service data of different service domains and service domain indexes;
the second construction module is used for acquiring the resource entities of the service domain to be fused and the relationship of the resource entities according to the knowledge graph so as to construct a service domain knowledge graph library;
and the fusion module is used for executing the operation of synchronizing, fusing and sharing the service domain information of the cross-service domain and the cross-resource entity based on the service domain knowledge map library.
9. The apparatus for processing multi-source heterogeneous data according to claim 8, further comprising:
the evaluation module is used for verifying the quality of the service data of each service domain according to the service domain indexes and the credible source indexes corresponding to each service domain which are configured in advance; wherein the content of the first and second substances,
the service domain index and the trusted source index are both configured as user-defined indexes for considering service data of different service domains, and the trusted source index is used as a reference standard for comparing the service domain indexes.
10. A computer device, comprising: a processor, a memory, and a computer program stored on the memory, the processor coupled to the memory, the processor in operation executing the computer program to implement the method of processing multi-source heterogeneous data of any of claims 1-7.
11. A computer-readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of processing multi-source heterogeneous data of any one of claims 1 to 7.
CN202111646837.XA 2021-12-30 2021-12-30 Processing method and device of multi-source heterogeneous data, computer equipment and storage medium Pending CN114443854A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111646837.XA CN114443854A (en) 2021-12-30 2021-12-30 Processing method and device of multi-source heterogeneous data, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111646837.XA CN114443854A (en) 2021-12-30 2021-12-30 Processing method and device of multi-source heterogeneous data, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114443854A true CN114443854A (en) 2022-05-06

Family

ID=81365261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111646837.XA Pending CN114443854A (en) 2021-12-30 2021-12-30 Processing method and device of multi-source heterogeneous data, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114443854A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221339A (en) * 2022-09-20 2022-10-21 联仁健康医疗大数据科技股份有限公司 Method, device, equipment and medium for constructing regional knowledge graph
CN115829144A (en) * 2022-12-16 2023-03-21 华北电力大学 Method for establishing power grid service optimization model and electronic equipment
CN116186359A (en) * 2023-05-04 2023-05-30 安徽宝信信息科技有限公司 Integrated management method, system and storage medium for multi-source heterogeneous data of universities
CN116244386A (en) * 2023-02-10 2023-06-09 北京友友天宇系统技术有限公司 Identification method of entity association relation applied to multi-source heterogeneous data storage system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221339A (en) * 2022-09-20 2022-10-21 联仁健康医疗大数据科技股份有限公司 Method, device, equipment and medium for constructing regional knowledge graph
CN115221339B (en) * 2022-09-20 2023-01-06 联仁健康医疗大数据科技股份有限公司 Method, device, equipment and medium for constructing regional knowledge graph
CN115829144A (en) * 2022-12-16 2023-03-21 华北电力大学 Method for establishing power grid service optimization model and electronic equipment
CN115829144B (en) * 2022-12-16 2023-07-07 华北电力大学 Method for establishing power grid business optimization model and electronic equipment
CN116244386A (en) * 2023-02-10 2023-06-09 北京友友天宇系统技术有限公司 Identification method of entity association relation applied to multi-source heterogeneous data storage system
CN116244386B (en) * 2023-02-10 2023-12-12 北京友友天宇系统技术有限公司 Identification method of entity association relation applied to multi-source heterogeneous data storage system
CN116186359A (en) * 2023-05-04 2023-05-30 安徽宝信信息科技有限公司 Integrated management method, system and storage medium for multi-source heterogeneous data of universities
CN116186359B (en) * 2023-05-04 2023-09-01 安徽宝信信息科技有限公司 Integrated management method, system and storage medium for multi-source heterogeneous data of universities

Similar Documents

Publication Publication Date Title
US11562025B2 (en) Resource dependency system and graphical user interface
CN114443854A (en) Processing method and device of multi-source heterogeneous data, computer equipment and storage medium
US11003645B1 (en) Column lineage for resource dependency system and graphical user interface
CN110781236A (en) Method for constructing government affair big data management system
CN110543571A (en) knowledge graph construction method and device for water conservancy informatization
CN111611458A (en) Method for realizing system data architecture combing based on metadata and data analysis technology in big data management
CN111563103B (en) Method and system for detecting data blood relationship
WO2023123182A1 (en) Multi-source heterogeneous data processing method and apparatus, computer device and storage medium
CN115757689A (en) Information query system, method and equipment
CN114511353A (en) Data analysis method and device
CN113779261B (en) Quality evaluation method and device of knowledge graph, computer equipment and storage medium
CN115510249A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN111461644A (en) Audit information management and control platform
CN115640300A (en) Big data management method, system, electronic equipment and storage medium
CN113326261B (en) Data blood relationship extraction method and device and electronic equipment
Petermann et al. Graph mining for complex data analytics
Yang et al. User story clustering in agile development: a framework and an empirical study
CN113326345A (en) Knowledge graph analysis and application method, platform and equipment based on dynamic ontology
Zhang et al. Application of data mining technology based on data center
CN113407678B (en) Knowledge graph construction method, device and equipment
CN112750047B (en) Behavior relation information extraction method and device, storage medium and electronic equipment
Zhao et al. Design and Implementation of Enterprise Public Data Management Platform Based on Artificial Intelligence
KR20130068633A (en) Apparatus and method for visualizing data
CN115630170B (en) Document recommendation method, system, terminal and storage medium
US11809398B1 (en) Methods and systems for connecting data with non-standardized schemas in connected graph data exchanges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination