CN115269871A - Enterprise knowledge graph optimization method, system, electronic equipment and storage medium - Google Patents

Enterprise knowledge graph optimization method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN115269871A
CN115269871A CN202210894660.3A CN202210894660A CN115269871A CN 115269871 A CN115269871 A CN 115269871A CN 202210894660 A CN202210894660 A CN 202210894660A CN 115269871 A CN115269871 A CN 115269871A
Authority
CN
China
Prior art keywords
enterprise
data
target
knowledge graph
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210894660.3A
Other languages
Chinese (zh)
Inventor
程光剑
聂志华
杨献祥
李磊
刘锦豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Intelligent Industry Technology Innovation Research Institute
Original Assignee
Jiangxi Intelligent Industry Technology Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Intelligent Industry Technology Innovation Research Institute filed Critical Jiangxi Intelligent Industry Technology Innovation Research Institute
Priority to CN202210894660.3A priority Critical patent/CN115269871A/en
Publication of CN115269871A publication Critical patent/CN115269871A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Mathematical Physics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an enterprise knowledge graph optimization method, an enterprise knowledge graph optimization system, electronic equipment and a storage medium, and belongs to the technical field of knowledge graphs, wherein the method comprises the steps of acquiring target enterprise information comprising unstructured data and preprocessed structured data; extracting initial enterprise entity and relationship information from the target enterprise information, and constructing an initial enterprise knowledge graph based on the initial enterprise entity and the relationship information; screening target structured data meeting the growth requirement from the structured data based on a preset growth rating model; and replacing the target structured data with the corresponding structured data in the initial entity and the relationship information so as to optimize the initial enterprise knowledge graph to obtain the target enterprise knowledge graph. By the method and the device, the initial knowledge graph can be screened according to the redundancy phenomenon of basic data in the initial enterprise knowledge graph so as to optimize the initial knowledge graph, indexes selected from the knowledge graph meet customer expectations, and therefore data quality and data utilization rate are improved.

Description

Enterprise knowledge graph optimization method, system, electronic equipment and storage medium
Technical Field
The invention belongs to the technical field of knowledge graphs, and particularly relates to an enterprise knowledge graph optimization method, an enterprise knowledge graph optimization system, electronic equipment and a storage medium.
Background
Due to the rise of the big data era and the fact that the research center of gravity of artificial intelligence is transited from perception intelligence to cognition intelligence, the enthusiasm knowledge of the knowledge map is promoted. The knowledge graph is also called as a scientific knowledge graph, is a series of different graphs for displaying the relation between the knowledge development process and the structure, describes knowledge resources and carriers thereof by using a visualization technology, and excavates, analyzes, constructs, draws and displays knowledge and the mutual relation between the knowledge resources and the carriers. The knowledge graph is one of application fields of artificial intelligence technology, has strong semantic processing and data structuring organization capacity, and provides a foundation for intelligent information application. The knowledge graph integrates, cross-associates, analyzes and compares large-scale data/knowledge by constructing a semantic network of entities and relations, deeply excavates the data, supports intelligent understanding representation, reasoning, retrieval and service of the knowledge, and provides self-help iterative analysis capability for users.
At present, a common method for constructing an enterprise knowledge graph is as follows: obtaining dimension information data specified by a field enterprise, extracting entities and relations of the data by using methods such as natural language processing or deep learning, and finally constructing an enterprise knowledge graph according to the extracted entity information and relation information of the enterprise. However, the construction of the enterprise knowledge graph focuses more on the integration and association processing of various structural or unstructured basic data, and does not involve effective screening aiming at the redundancy phenomenon of the basic data, so that the selected indexes in the constructed knowledge graph do not meet the expectation of customers, and the required storage space is large, that is, the data quality of the constructed knowledge graph is poor, and the requirements of the customers are difficult to meet.
Disclosure of Invention
In order to solve the technical problems, the invention provides an enterprise knowledge graph optimization method, an enterprise knowledge graph optimization system, electronic equipment and a storage medium, which can be used for screening the redundancy phenomenon of basic data in an initial enterprise knowledge graph so as to optimize the initial knowledge graph, so that indexes selected from the knowledge graph meet the expectations of customers, and the data quality and the data utilization rate are improved.
In a first aspect, an embodiment of the present application provides an enterprise knowledge graph optimization method, including:
acquiring target enterprise information comprising unstructured data and preprocessed structured data;
extracting initial enterprise entity and relationship information from the target enterprise information, and constructing an initial enterprise knowledge graph based on the initial enterprise entity and relationship information;
screening target structured data meeting the growth requirement from the structured data based on a preset growth rating model;
and replacing the target structured data with the corresponding structured data in the initial entity and the relationship information so as to optimize the initial enterprise knowledge graph to obtain the target enterprise knowledge graph.
Preferably, the step of extracting initial enterprise entity and relationship information from the target enterprise information and constructing an initial enterprise knowledge graph based on the initial enterprise entity and relationship information includes:
extracting entity and relationship information which accord with a preset standard from the structured data according to a preset condition;
carrying out enterprise entity identification and relationship extraction on the unstructured enterprise data by adopting natural language processing or deep learning technology;
acquiring initial enterprise entity and relationship information based on the processing results of the structured data and the unstructured data;
and constructing an initial enterprise knowledge graph based on the initial enterprise entity and the relationship information.
Preferably, the step of screening out the target structured data meeting the growth requirement from the structured data based on the preset growth rating model includes:
normalizing the structured data to obtain index data, and constructing an enterprise index system according to the index data;
performing weight calculation on the index data by adopting an objective weighting method;
analyzing the weight result of the index data, and screening target index data from the index data;
carrying out weight calculation on the target index data again, and calculating the weighted sum of the numerical values of all indexes in the target index data and the corresponding weights of the numerical values;
obtaining a result obtained by the weighting and renormalization processing as a growth score of the target enterprise;
and outputting the target structured data corresponding to the target index data meeting the growth requirement.
Preferably, the step of normalizing the structured data to obtain index data and the step of constructing an enterprise index system according to the index data includes:
integrating and extracting the structured data, screening and removing abnormal values in the structured data, and supplementing null values in the structured data based on the mean value of the target enterprise in the last three years;
performing normalization processing on the integrated and extracted structural data to obtain index data, wherein the index data comprises financial indexes and non-financial indexes;
and constructing an enterprise index system according to the index data.
Preferably, the step of outputting the target structured data corresponding to the target index data meeting the growing requirement specifically includes:
dividing the target enterprise into a growth stage, a maturity stage or a decline stage according to the growth score;
verifying the division effect of the target enterprise through a classification algorithm, wherein the verification refers to the verification of precision ratio, recall ratio, F1 value and AUC value results;
and outputting the target structured data corresponding to the target index data which meets the growth requirement.
Preferably, the step of replacing the target structured data with the corresponding structured data in the initial entity and relationship information to optimize the initial enterprise knowledge graph to obtain the target enterprise knowledge graph specifically includes:
sorting according to importance based on the target structured data, and selecting important structured data with preset importance levels, wherein the importance is evaluated according to the grade of an index;
and based on the initial enterprise knowledge graph, replacing the corresponding structural data in the initial enterprise entity and relationship information with the important structural data to optimize the initial enterprise knowledge graph and obtain a target enterprise knowledge graph.
Preferably, the preprocessing of the structured data includes one or a combination of two or more of missing value processing, outlier processing, or transform-merge-value processing.
In a second aspect, an embodiment of the present application provides an enterprise knowledge graph optimization system, including:
the acquisition module is used for acquiring target enterprise information comprising unstructured data and preprocessed structured data;
the construction module is used for extracting initial enterprise entity and relationship information from the target enterprise information and constructing an initial enterprise knowledge graph based on the initial enterprise entity and relationship information;
the screening module is used for screening out target structured data meeting the growth requirement from the structured data based on a preset growth rating model;
and the optimization module is used for replacing the corresponding structural data in the initial entity and relationship information with the target structural data so as to optimize the initial enterprise knowledge graph to obtain a target enterprise knowledge graph.
Preferably, the building module comprises:
the extraction unit is used for extracting the entity and the relation information which accord with the preset standard from the structured data according to the preset condition;
the extraction unit is used for carrying out enterprise entity identification and relationship extraction on the unstructured enterprise data by adopting natural language processing or deep learning technology;
the acquiring unit is used for acquiring initial enterprise entity and relationship information based on the processing results of the structured data and the unstructured data;
and the construction unit is used for constructing an initial enterprise knowledge graph based on the initial enterprise entity and the relationship information.
Preferably, the screening module will include:
the first processing unit is used for carrying out normalization processing on the structured data to obtain index data and constructing an enterprise index system according to the index data;
the first calculation unit is used for performing weight calculation on the index data by adopting an objective weighting method;
the screening unit is used for analyzing the weight result of the index data and screening target index data from the index data;
the second calculation unit is used for carrying out weight calculation on the target index data again and calculating the weighted sum of the numerical values of all indexes in the target index data and the corresponding weights of the numerical values;
the second processing unit is used for obtaining the result obtained by the weighting and renormalization processing as the growth score of the target enterprise;
and the output unit is used for outputting the target structured data corresponding to the target index data meeting the growth requirement.
Preferably, the optimization module comprises:
the sorting unit is used for sorting according to importance based on the target structured data and selecting important structured data with preset importance levels, wherein the importance is evaluated according to the grade of an index;
and the replacing unit is used for replacing the corresponding structured data in the initial enterprise entity and relationship information with the important structured data based on the initial enterprise knowledge graph so as to optimize the initial enterprise knowledge graph and obtain a target enterprise knowledge graph.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor, when executing the computer program, implements the enterprise knowledge graph optimization method according to the first aspect.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the enterprise knowledge graph optimization method according to the first aspect.
Compared with the prior art, the enterprise knowledge graph optimization method, the enterprise knowledge graph optimization system, the electronic equipment and the storage medium are characterized in that firstly structured data of enterprise basic information and enterprise management financial information of a target enterprise are collected and processed aiming at invalid data, text unstructured data of policies related to the target enterprise and enterprise public opinions and the like are collected, and target enterprise information is obtained; secondly, extracting entity information and relationship information data of the processed structured data according to preset conditions to obtain enterprise entity and relationship information which meet the standards, carrying out entity identification and relationship extraction on the unstructured data by using a natural language processing technology, identifying the nature of the news public sentiment of the enterprise based on a deep learning algorithm, obtaining initial enterprise entity and relationship information, and constructing an initial enterprise knowledge graph based on the initial enterprise entity and relationship information; thirdly, constructing a growth rating model based on a machine learning algorithm, and screening out target structured data meeting the growth requirement based on the growth rating model; and finally, ordering according to importance based on the target structured data, selecting important structured data with preset importance levels, and replacing the corresponding structured data in the initial enterprise entity and the relationship information with the important structured data to obtain a target enterprise knowledge graph. Through the steps, the redundancy phenomenon of basic data in the knowledge graph of the initial enterprise can be screened to optimize the initial knowledge graph, so that indexes selected from the knowledge graph meet customer expectations, and the data quality and the data utilization rate are improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of an enterprise knowledge graph optimization method according to an embodiment of the present invention;
fig. 2 is a detailed flowchart of step S103 according to a first embodiment of the present invention;
FIG. 3 is a block diagram of an enterprise knowledge graph optimization system corresponding to a method according to a second embodiment of the present invention;
fig. 4 is a schematic diagram of a hardware structure of an electronic device according to a third embodiment of the present invention.
Description of reference numerals:
10-an acquisition module;
20-construction module, 21-extraction unit, 22-extraction unit, 23-acquisition unit and 24-construction unit;
30-screening module, 31-first processing unit, 32-first calculating unit, 33-screening unit, 34-second calculating unit, 35-second processing unit and 36-output unit;
40-an optimization module, 41-a sorting unit and 42-a replacing unit;
50-bus, 51-processor, 52-memory, 53-communication interface.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be illustrative of the embodiments of the present invention, and should not be construed as limiting the invention.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by one of ordinary skill in the art that the embodiments described herein may be combined with other embodiments without conflict.
At present, the construction of enterprise knowledge graphs focuses more on the integration and association processing of various structural or unstructured basic data, and effective screening aiming at redundancy phenomena of the basic data is not involved, so that selected indexes in the constructed knowledge graphs do not meet expectations of customers, the required storage space is large, and the constructed knowledge graphs are poor in data quality and low in data utilization rate.
Therefore, the invention provides an enterprise knowledge graph optimization method, an enterprise knowledge graph optimization system, electronic equipment and a storage medium, wherein the initial knowledge graph is optimized by screening the redundancy phenomenon of basic data in the initial enterprise knowledge graph, so that indexes selected from the knowledge graph meet the customer expectation, and the data quality and the data utilization rate are improved.
Example 1
The embodiment provides an enterprise knowledge graph optimization method. FIG. 1 is a flowchart of an enterprise knowledge graph optimization method of the embodiment, and as shown in FIG. 1, the flowchart includes the following steps:
s101, acquiring target enterprise information comprising unstructured data and preprocessed structured data;
the structured data comprises character data such as enterprise basic information and numerical data such as enterprise management financial information; the enterprise basic information refers to enterprise basic information such as enterprise names, company addresses and registration time, and the enterprise management financial information refers to financial information such as enterprise profit data, asset liability data and cash flow data. The unstructured data refers to text data such as enterprise-related policies and enterprise public sentiments, and serves as training data of downstream tasks and enterprise public sentiment source table data of the front end of display;
specifically, different methods are adopted for invalid data (document character type) of the enterprise basic information: 1. missing value processing: and when the acquired data has a serious missing condition, deleting the data and supplementing the missing condition by a crawler mode. 2. Abnormal value processing: and replacing the acquired data by empirical numbers under the condition of no serious loss. 3. And (3) transformation and combination value processing: for example, the date data has format problem to be converted. Different methods are adopted for invalid data (numerical type) of enterprise operation financial information: 1. missing value processing: the acquired data has a missing condition; deletion operation is adopted under the condition that the deletion condition is particularly serious; and (3) filling operations such as median filling, average filling, mode filling and algorithm filling are adopted when the data loss is not serious. 2. Abnormal value processing: the collected data has abnormity, and the operation mode comprises deleting, estimating by using the average value and the median value, and carrying out logarithmic transformation on the variable.
S102, extracting initial enterprise entity and relationship information from the target enterprise information, and constructing an initial enterprise knowledge graph based on the initial enterprise entity and relationship information;
knowledge-graphs essentially precipitate objective experience in vast networks. Where nodes represent entities or concepts and edges represent semantic relationships between entities/concepts. Mature graph databases such as neo4j, digraph, janusGraph, may be used to store the knowledge-graph. Extracting the knowledge graph by using a class algorithm from top to bottom, such as google and baidu, and classifying the knowledge graph into the existing schema; if the classification is not achieved, a mode matching of a new schema needs to be generated. In the process of constructing the knowledge graph, except for the schema, the content is filled in the knowledge graph, and the process is knowledge extraction.
Further, the specific steps of step S102 include:
s1021, extracting entity and relation information which accord with a preset standard from the structured data according to a preset condition;
the preset condition refers to extracting the related entities and relations of the target enterprise, wherein the entities are as follows: enterprise Chinese name, enterprise ID, enterprise high management name, competitive product enterprise name, etc.; relationships such as: enterprise external investment relationship, enterprise supplier relationship, enterprise customer relationship, enterprise shareholder holdings relationship, etc. The preset standard refers to a standard which meets the data storage format of the neo4j graph;
specifically, collected character-type data such as enterprise basic information of a required target enterprise and numerical-type data such as enterprise business financial information are processed into a data storage format capable of meeting the requirements of a graph data, and the purpose is to adjust the format of structured data of the target enterprise so that the structured data can be applicable to a data format required by knowledge graph construction.
S1022, carrying out enterprise entity identification and relationship extraction on the unstructured enterprise data by adopting natural language processing or deep learning technology;
the natural language processing technology is a general term for all technologies related to computer processing of natural language, and aims to make a computer understand and accept instructions input by human beings in natural language and complete a translation function from one language to another language. The natural language processing technology can specifically adopt LSTM-CRF, wherein LSTM is Long Shot-Term memoryNeural Network, and CRF is Conditional random field. The nature of the business news consensus means that the consensus of the news item is a category that tends to be positive, neutral or negative, for characterizing downstream tasks.
And S1023, acquiring initial enterprise entity and relationship information based on the processing results of the structured data and the unstructured data.
And S1024, constructing an initial enterprise knowledge graph based on the initial enterprise entity and the relationship information.
S103, screening out target structured data meeting the growth requirement from the structured data based on a preset growth rating model;
the enterprise growth evaluation is the judgment and evaluation of the future growth potential of the enterprise, and is used for identifying the future development trend and potential of the enterprise, including measuring the development direction, speed, capability, result and the like of the enterprise. The core of enterprise growth evaluation is the evaluation of the growth space and growth potential of the enterprise;
specifically, the preset growth rating model reflects the comprehensive growth capacity of the enterprise from different aspects by setting indexes, including the expansion capacity, the profit capacity, the operation capacity, the cash strength and the technical innovation capacity. The indexes are positive indexes, and linear relation exists between each growth evaluation index and the growth of an enterprise, so that the established growth evaluation model is a multivariate one-time linear evaluation model.
Further, as shown in fig. 2, the specific steps of step S103 include:
s1031, normalizing the structured data to obtain index data, and constructing an enterprise index system according to the index data;
wherein, step S1031 is completed by the following steps:
s10311, integrating and extracting the structured data, screening and removing abnormal values in the structured data, and supplementing null values in the structured data based on the average value of the target enterprise in the last three years;
specifically, when an abnormal value exists in the extracted structured data, the data is not reliable, the abnormal value is directly removed, and the evaluation result is prevented from being influenced by an index corresponding to the abnormal value; and for the null value part, correspondingly filling the null value part by using the average value of the index of the target enterprise in the last three years so as to reduce the influence of the null value to the minimum.
S10312, performing normalization processing on the integrated and extracted structural data to obtain index data, wherein the index data comprises financial indexes and non-financial indexes;
specifically, because the data of different structured data have very large differences, for example, some data have growth rates, the data are generally between 0 and 1, the absolute values of some data may be very large, if the data are not normalized, part of indexes may result in large values and large influence, the evaluation of other structured data is not fair, and through the normalization, the influence of each item of structured data on the final evaluation result is effectively balanced, so that the enterprise growth score of the final calculation evaluation is more reasonable.
And S10313, constructing an enterprise index system according to the index data.
S1032, performing weight calculation on the index data by adopting an objective weighting method;
specifically, the calculation method of the objective weighting method determines the weight through a certain mathematical method according to the relationship between the preprocessed index data, the judgment result does not depend on the subjective judgment of people, and the method has a strong mathematical theoretical basis. By way of example and not limitation, in the embodiment of the present invention, it is only a preferable calculation method to calculate the preprocessed index data by using a combined objective weighting method, and the weight calculation method of the preprocessed index data is not specifically limited in the present application.
S1033, analyzing the weight result of the index data, and screening target index data from the index data;
specifically, by analyzing the weight result of the index data, the index data with low weight value and high similarity is filtered and removed, so that the filtered index data has better representativeness.
S1034, carrying out weight calculation on the target index data again, and calculating the weighted sum of the numerical values of all indexes in the target index data and the corresponding weights of the numerical values.
S1035, obtaining a result of the weighting and renormalization processing, namely the growth score of the target enterprise;
specifically, the final obtained enterprise growth score has high accuracy by calculating the weighted sum of the numerical values of the screened index data and the corresponding weights and calculating the score in a normalized mode, and the calculated mode has good anti-overfitting and anti-noise characteristics. And further, the growth evaluation result of the enterprise is more reliable.
S1036, outputting target structured data corresponding to the target index data meeting the growth requirement;
specifically, step S1036 is completed by the following procedure:
s10361, dividing the target enterprise into a growth period, a maturity period or a decline period according to the growth score;
specifically, the target enterprise is divided into growth stages according to a growth period, a maturity period and a decline period, and the growth score of the target enterprise is trained and evaluated, so that the growth stage of the target enterprise is defined. The rating objective for the target enterprise is completed.
S10362, verifying the division effect of the target enterprise through a classification algorithm, wherein the verification refers to the verification of precision ratio, recall ratio, F1 value and AUC value results;
the classification algorithm comprises a random forest algorithm and an Xgboost algorithm. The F1 value is the special condition of the F-Measure, is the harmonic mean of the precision ratio and the recall ratio, and can comprehensively reflect the accuracy of the classification result, and the closer the result is to 1, the higher the accuracy is. The AUC (Area Under rock) value is a standard for measuring the quality of the classification model, and the AUC value is a comprehensive index for evaluating the accuracy of the classification model.
Specifically, training and evaluation results of various division combinations are obtained through a random forest algorithm and an Xgboost algorithm, the growth score is divided into corresponding enterprise life cycles according to a local optimal division mode, and a final rating effect is verified through a multi-classification algorithm.
And S10363, outputting the target structured data corresponding to the target index data meeting the growth requirement.
S104, replacing the corresponding structured data in the initial entity and relationship information with the target structured data so as to optimize the initial enterprise knowledge graph to obtain a target enterprise knowledge graph;
further, the specific steps of step S104 include:
s1041, sorting according to importance based on the target structured data, and selecting important structured data with preset importance level, wherein the importance is evaluated according to the grade of the index;
s1042, based on the initial enterprise knowledge graph, replacing the important structured data with the corresponding structured data in the initial enterprise entity and relationship information to optimize the initial enterprise knowledge graph to obtain a target enterprise knowledge graph.
Through the steps, the redundancy phenomenon of basic data in the knowledge graph of the initial enterprise can be screened to optimize the initial knowledge graph, so that indexes selected from the knowledge graph meet customer expectations, and the data quality and the data utilization rate are improved.
Example 2
This embodiment provides a block diagram of a system corresponding to the method described in embodiment 1. FIG. 3 is a block diagram of an enterprise knowledge-graph optimization system according to an embodiment of the present application, and as shown in FIG. 3, the system includes:
an obtaining module 10, configured to obtain target enterprise information including unstructured data and preprocessed structured data;
a construction module 20, configured to extract initial enterprise entity and relationship information from the target enterprise information, and construct an initial enterprise knowledge graph based on the initial enterprise entity and relationship information;
the screening module 30 is configured to screen out target structured data meeting the growth requirement from the structured data based on a preset growth rating model;
and the optimization module 40 is configured to replace the structural data corresponding to the initial entity and the relationship information with the target structural data, so that the initial enterprise knowledge graph is optimized to obtain a target enterprise knowledge graph.
Preferably, the building module 20 comprises:
an extracting unit 21, configured to extract, according to a preset condition, entity and relationship information that meet a preset standard from the structured data;
the extraction unit 22 is used for performing enterprise entity identification and relationship extraction on the unstructured enterprise data by adopting natural language processing or deep learning technology;
an obtaining unit 23, configured to obtain initial enterprise entity and relationship information based on the processing results of the structured data and the unstructured data;
and the constructing unit 24 is configured to construct an initial enterprise knowledge graph based on the initial enterprise entity and the relationship information.
Preferably, the screening module 30 will include:
the first processing unit 31 is configured to normalize the structured data to obtain index data, and construct an enterprise index system according to the index data;
a first calculation unit 32 configured to perform weight calculation on the index data by using an objective weighting method;
a screening unit 33, configured to analyze a weight result of the index data, and screen target index data from the index data;
the second calculating unit 34 is configured to perform weight calculation on the target index data again, and calculate a weighted sum of numerical values of each index in the target index data and corresponding weight of each index;
the second processing unit 35 is configured to determine a result obtained by the weighting and renormalization processing as a growth score of the target enterprise;
and the output unit 36 is configured to output the target structured data corresponding to the target index data meeting the growth requirement.
Preferably, the optimization module 40 comprises:
the sorting unit 41 is configured to sort according to importance based on the target structured data and select important structured data of a preset importance level, where the importance is evaluated according to the score of an index;
a replacing unit 42, configured to replace, based on the initial enterprise knowledge graph, the structural data corresponding to the initial enterprise entity and the relationship information with the important structural data to optimize the initial enterprise knowledge graph, so as to obtain a target enterprise knowledge graph.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
Example 3
The enterprise knowledge graph optimization method of the present invention described in connection with FIG. 1 may be implemented by an electronic device. Fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
The electronic device may comprise a processor 51 and a memory 52 in which computer program instructions are stored.
Specifically, the processor 51 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 52 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 52 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, magnetic tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 52 may include removable or non-removable (or fixed) media, where appropriate. The memory 52 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 52 is a Non-Volatile (Non-Volatile) memory. In certain embodiments, memory 52 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended Data Out Dynamic Random Access Memory (EDODRAM), a Synchronous Dynamic Random Access Memory (SDRAM), and the like.
The memory 52 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 51.
The processor 51 implements the enterprise knowledge graph optimization method of embodiment 1 described above by reading and executing computer program instructions stored in the memory 52.
In some of these embodiments, the electronic device may also include a communication interface 53 and a bus 50. As shown in fig. 4, the processor 51, the memory 52, and the communication interface 53 are connected via the bus 50 to complete mutual communication.
The communication interface 53 is used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application. The communication interface 53 may also enable communication with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
Bus 50 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 50 includes, but is not limited to, at least one of the following: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example and not limitation, bus 50 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a vlslave Bus, a Video Bus, or a combination of two or more of these suitable electronic buses. Bus 50 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device may execute the enterprise knowledge graph optimization method of embodiment 1 of the present application based on the obtained enterprise knowledge graph optimization system.
In addition, in combination with the enterprise knowledge graph optimization method in embodiment 1, the embodiment of the present application may provide a storage medium to implement. The storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement the enterprise knowledge-graph optimization method of embodiment 1 above.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. An enterprise knowledge graph optimization method is characterized by comprising the following steps:
acquiring target enterprise information comprising unstructured data and preprocessed structured data;
extracting initial enterprise entity and relationship information from the target enterprise information, and constructing an initial enterprise knowledge graph based on the initial enterprise entity and relationship information;
screening target structured data meeting the growth requirement from the structured data based on a preset growth rating model;
and replacing the target structured data with the corresponding structured data in the initial entity and the relationship information so as to optimize the initial enterprise knowledge graph to obtain the target enterprise knowledge graph.
2. The method of claim 1, wherein the step of extracting initial business entity and relationship information from the target business information and constructing an initial business knowledge-graph based on the initial business entity and relationship information comprises:
extracting entity and relationship information which accord with a preset standard from the structured data according to a preset condition;
carrying out enterprise entity identification and relationship extraction on the unstructured enterprise data by adopting natural language processing or deep learning technology;
acquiring initial enterprise entity and relationship information based on the processing results of the structured data and the unstructured data;
and constructing an initial enterprise knowledge graph based on the initial enterprise entity and the relationship information.
3. The enterprise knowledge graph optimizing method of claim 1, wherein the step of screening out the target structured data meeting the growth requirement from the structured data based on a preset growth rating model comprises:
normalizing the structured data to obtain index data, and constructing an enterprise index system according to the index data;
performing weight calculation on the index data by adopting an objective weighting method;
analyzing the weight result of the index data, and screening target index data from the index data;
carrying out weight calculation on the target index data again, and calculating the weighted sum of the numerical values of all indexes in the target index data and the corresponding weights of the numerical values;
the result obtained by the weighting and renormalization processing is the growth score of the target enterprise;
and outputting the target structured data corresponding to the target index data meeting the growth requirement.
4. The enterprise knowledge graph optimizing method of claim 3, wherein the step of normalizing the structured data to obtain index data and constructing an enterprise index system according to the index data comprises:
integrating and extracting the structured data, screening and removing abnormal values in the structured data, and supplementing null values in the structured data based on the mean value of the target enterprise in the last three years;
normalizing the integrated and extracted structural data to obtain index data, wherein the index data comprises financial indexes and non-financial indexes;
and constructing an enterprise index system according to the index data.
5. The enterprise knowledge graph optimizing method of claim 3, wherein the step of outputting the target structured data corresponding to the target index data meeting the growth requirement specifically comprises:
dividing the target enterprise into a growth stage, a maturity stage or a decline stage according to the growth score;
verifying the division effect of the target enterprise through a classification algorithm, wherein the verification refers to the verification of precision ratio, recall ratio, F1 value and AUC value results;
and outputting the target structured data corresponding to the target index data meeting the growth requirement.
6. The method according to claim 1, wherein the step of replacing the corresponding structured data in the initial entity and relationship information with the target structured data to optimize the initial enterprise knowledge graph to obtain the target enterprise knowledge graph specifically comprises:
sorting the target structured data according to importance, and selecting important structured data with preset importance levels, wherein the importance is evaluated according to the grade of an index;
and based on the initial enterprise knowledge graph, replacing the corresponding structured data in the initial enterprise entity and relationship information with the important structured data to optimize the initial enterprise knowledge graph to obtain a target enterprise knowledge graph.
7. The enterprise knowledge graph optimization method of any one of claims 1-6, wherein the preprocessing of the structured data comprises one or a combination of two or more of missing value processing, outlier processing, or transform-and-merge value processing.
8. An enterprise knowledge graph optimization method is characterized by comprising the following steps:
the acquisition module is used for acquiring target enterprise information comprising unstructured data and preprocessed structured data;
the construction module is used for extracting initial enterprise entity and relationship information from the target enterprise information and constructing an initial enterprise knowledge graph based on the initial enterprise entity and relationship information;
the screening module is used for screening out target structured data meeting the growth requirement from the structured data based on a preset growth rating model;
and the optimization module is used for replacing the corresponding structural data in the initial entity and relationship information with the target structural data so as to optimize the initial enterprise knowledge graph to obtain a target enterprise knowledge graph.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the enterprise knowledge graph optimization method of any one of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the enterprise knowledge graph optimization method according to any one of claims 1 to 7.
CN202210894660.3A 2022-07-28 2022-07-28 Enterprise knowledge graph optimization method, system, electronic equipment and storage medium Pending CN115269871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210894660.3A CN115269871A (en) 2022-07-28 2022-07-28 Enterprise knowledge graph optimization method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210894660.3A CN115269871A (en) 2022-07-28 2022-07-28 Enterprise knowledge graph optimization method, system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115269871A true CN115269871A (en) 2022-11-01

Family

ID=83770225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210894660.3A Pending CN115269871A (en) 2022-07-28 2022-07-28 Enterprise knowledge graph optimization method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115269871A (en)

Similar Documents

Publication Publication Date Title
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
WO2017097231A1 (en) Topic processing method and device
US10019442B2 (en) Method and system for peer detection
CN110851428B (en) Database analysis method, device and medium based on rule operator dynamic arrangement
WO2019047790A1 (en) Method and system for generating combined features of machine learning samples
US20220147023A1 (en) Method and device for identifying industry classification of enterprise and particular pollutants of enterprise
US10387805B2 (en) System and method for ranking news feeds
WO2022121163A1 (en) User behavior tendency identification method, apparatus, and device, and storage medium
CN108804564A (en) The combined recommendation method and terminal device of financial product
CN112907358A (en) Loan user credit scoring method, loan user credit scoring device, computer equipment and storage medium
CN107527289B (en) Investment portfolio industry configuration method, device, server and storage medium
US20220229854A1 (en) Constructing ground truth when classifying data
Nurhachita et al. A comparison between deep learning, naïve bayes and random forest for the application of data mining on the admission of new students
Zhang et al. Research on borrower's credit classification of P2P network loan based on LightGBM algorithm
CN109784354B (en) Improved classification utility-based parameter-free clustering method and electronic equipment
WO2023246849A1 (en) Feedback data graph generation method and refrigerator
CN114528378A (en) Text classification method and device, electronic equipment and storage medium
KR101456187B1 (en) Method for evaluating patents based on complex factors
CN115269871A (en) Enterprise knowledge graph optimization method, system, electronic equipment and storage medium
Ma The Research of Stock Predictive Model based on the Combination of CART and DBSCAN
CN113988878A (en) Graph database technology-based anti-fraud method and system
CN114495137A (en) Bill abnormity detection model generation method and bill abnormity detection method
CN114266653A (en) Client loan risk estimation method for integrated learning
CN110737749B (en) Entrepreneurship plan evaluation method, entrepreneurship plan evaluation device, computer equipment and storage medium
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination