CN116881430B - Industrial chain identification method and device, electronic equipment and readable storage medium - Google Patents

Industrial chain identification method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN116881430B
CN116881430B CN202311152477.7A CN202311152477A CN116881430B CN 116881430 B CN116881430 B CN 116881430B CN 202311152477 A CN202311152477 A CN 202311152477A CN 116881430 B CN116881430 B CN 116881430B
Authority
CN
China
Prior art keywords
enterprise
information
target
query
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311152477.7A
Other languages
Chinese (zh)
Other versions
CN116881430A (en
Inventor
孙会峰
邢婷
邵冰清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shangqi Digital Technology Co ltd
Original Assignee
Beijing Shangqi Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shangqi Digital Technology Co ltd filed Critical Beijing Shangqi Digital Technology Co ltd
Priority to CN202311152477.7A priority Critical patent/CN116881430B/en
Publication of CN116881430A publication Critical patent/CN116881430A/en
Application granted granted Critical
Publication of CN116881430B publication Critical patent/CN116881430B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses an industrial chain identification method, an industrial chain identification device, electronic equipment and a readable storage medium, and relates to the technical field of computers. The method comprises the following steps: collecting an original data set from a plurality of data sources, cleaning the data of the original data set, screening the cleaned data to obtain data related to enterprises, extracting target enterprise information and target inter-enterprise association information from the data related to the enterprises, creating an enterprise knowledge graph according to the target enterprise information and the target inter-enterprise association information, mining industry links among enterprises in the enterprise knowledge graph, receiving query enterprise information input by a user, and identifying the industry links corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information. According to the invention, various relations between the multidimensional information of the enterprise and the enterprise are comprehensively considered, a more perfect knowledge graph system with more complete information is built, and the industrial chain to which the enterprise belongs is judged more accurately and with high confidence because of more and complete information.

Description

Industrial chain identification method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an industrial chain identification method, an apparatus, an electronic device, and a readable storage medium.
Background
The enterprise industry chain can objectively reflect the business capability of the enterprise, and can be used as an important reference or basis when the risk identification is carried out on the enterprise, so that the identification of the industry chain to which the enterprise belongs is necessary.
At present, a great deal of research on industrial chain identification technology of enterprises in China has been carried out, for example:
the Chinese patent application CN111915191A discloses an industrial chain identification method, which comprises the steps of determining at least one associated enterprise associated with a target enterprise according to the acquired associated information of the target enterprise; determining at least one in-chain enterprise of the target enterprise from the at least one associated enterprise based on a product relationship between the target enterprise and the associated enterprise; the on-chain enterprise is an upstream enterprise or a downstream enterprise of a target enterprise in the industrial chain; and for each on-chain enterprise, determining the upstream and downstream relation between the on-chain enterprise and the target enterprise according to the enterprise information of the on-chain enterprise and the target enterprise.
However, in the method, the industrial chain to which the enterprise belongs is identified based on the product relationship between the target enterprise and the associated enterprise, and the identification factor to be considered is single, so that the confidence of judging the industrial chain to which the enterprise belongs is low.
Chinese patent application CN113792158A discloses an industrial chain identification method, which comprises the steps of preprocessing transaction running water data by obtaining the transaction running water data, obtaining business transaction running water data and constructing key information of fund maps; constructing a fund map based on the business transaction running water data and the key information; based on the fund spectrum, mining industry sequences meeting a support threshold in the fund spectrum by using an association mining algorithm to obtain an industry frequent sequence set; and taking k industry sequences with highest industrial relevance as chain forming sequences, and obtaining and displaying an industrial chain map.
However, because the knowledge graph construction is constructed based on the business flow data, and the factors considered in the construction of the knowledge graph are single, the constructed knowledge graph has insufficient relation data and information data to support accurate judgment of the industry chain of the enterprise, so that the confidence of judging the industry chain of the enterprise is lower.
Disclosure of Invention
In order to solve the problem of low confidence in judging the industrial chain to which an enterprise belongs in the prior art, the invention provides the following technical scheme.
The present invention provides, in a first aspect, an industrial chain identification method, including:
collecting an original data set from a plurality of data sources;
performing data cleaning on the original data set;
screening the cleaned data to obtain data related to enterprises;
extracting target enterprise information and target inter-enterprise association information from the enterprise-related data;
creating an enterprise knowledge graph according to the target enterprise information and the target enterprise association information;
digging an industry chain among enterprises in the enterprise knowledge graph;
receiving inquiry enterprise information input by a user;
and identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information.
Preferably, collecting the raw data set from a plurality of data sources comprises:
and collecting bid information and bid winning information released by the target website, management files, industry hotspot information released by various industry information sources and enterprise information provided by suppliers.
Preferably, the step of screening the cleaned data to obtain data related to the enterprise includes:
And screening the cleaned data through a pre-established enterprise screening model based on feedforward neural network framework training to obtain data related to enterprises.
Preferably, extracting the target enterprise information and the target inter-enterprise association information from the enterprise-related data includes:
and extracting target enterprise information and target enterprise association information from the enterprise-related data in a plurality of modes, and storing the information and the target enterprise association information in a knowledge graph base.
Preferably, extracting the target enterprise information and the target inter-enterprise association information from the enterprise-related data in a plurality of ways includes:
extracting the selected unstructured enterprise related data through a pre-established information extraction model based on a bert named entity recognition framework to obtain target enterprise information and target enterprise related information; or (b)
And cleaning the screened structured enterprise related data through normalization, dimension reduction and duplication removal processes to obtain the target enterprise information and the target enterprise related information.
Preferably, creating an enterprise knowledge graph according to the target enterprise information and the target enterprise association information includes:
integrating the target enterprise information and the target enterprise associated information;
Constructing a summary of the enterprise knowledge graph based on a resource description framework (Resource Description Framework Schema, RDFS);
and forming the target enterprise information and the related information between the target enterprises into triples according to the created summary of the enterprise knowledge graph, and storing the triples.
Preferably, mining the industry chain between enterprises in the enterprise knowledge graph includes:
determining, for each enterprise node within the enterprise knowledge graph, a shortest path from the enterprise node to each enterprise node having a connection path therewith;
calculating the edge medium number of each undirected edge in the shortest path;
for each undirected edge, adding and summing the edge betweenness calculated for all enterprise nodes corresponding to the undirected edge to obtain the total edge betweenness of the undirected edge;
deleting the undirected edge with the largest total edge dielectric number;
repeating the steps for the rest enterprise nodes with the connecting paths until a plurality of enterprise clusters are formed;
each enterprise node in the enterprise cluster belongs to the same industry chain, and different enterprise clusters represent different types of industry chains.
Preferably, the enterprise knowledge graph includes: enterprise information;
identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information, wherein the industry chain comprises the following steps:
For each enterprise node in the enterprise knowledge graph, calculating the similarity between enterprise information in the enterprise node and the query enterprise information;
and taking the industry chain of the enterprise node with the highest similarity between the enterprise information and the query enterprise information as the industry chain of the query enterprise information.
Preferably, calculating the similarity between the enterprise information in the enterprise node and the query enterprise information includes:
carrying out token initialization on the enterprise information and the query enterprise information;
word segmentation is carried out on the enterprise information and the query enterprise information, and word indexes are established;
generating the enterprise information and the position code of the query enterprise information according to the word index after word segmentation;
converting the codes containing the context position information into tensor data to generate word vectors containing the context position information;
through Euclidean distance formulaCalculating the similarity between the enterprise information in the enterprise node and the query enterprise information, wherein D represents the similarity between the enterprise information in the enterprise node and the query enterprise information,/>is the X tensor at position i, +.>Is the Y tensor at the i position.
A second aspect of the present invention provides an industrial chain identification device, including:
the acquisition module is used for acquiring an original data set from a plurality of data sources;
the cleaning module is used for cleaning the data of the original data set;
the screening module is used for screening the cleaned data to obtain data related to enterprises;
the extraction module is used for extracting target enterprise information and target inter-enterprise association information from the enterprise-related data;
the creating module is used for creating an enterprise knowledge graph according to the target enterprise information and the target enterprise association information;
the mining module is used for mining industry chains among enterprises in the enterprise knowledge graph;
the receiving module is used for receiving inquiry enterprise information input by a user;
and the query module is used for identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information.
A third aspect of the invention provides an electronic device comprising a processor and a memory, the memory storing a plurality of instructions, the processor being for reading the instructions and performing the method of the first aspect.
A fourth aspect of the invention provides a computer readable storage medium storing a plurality of instructions readable by a processor and carrying out the method of the first aspect.
The beneficial effects of the invention are as follows: according to the method, a knowledge graph is built according to the target enterprise information and the related information between the target enterprises, various relations between the multidimensional information of the enterprises and the enterprises are comprehensively considered, a more perfect knowledge graph system with more complete information is built, and due to the fact that the considered information is more complete, the industrial chain to which the enterprises belong is finally judged to be more accurate, and the confidence of the industrial chain is high.
Drawings
Fig. 1 is a flowchart of an industrial chain identification method according to the present invention.
Fig. 2 is a block diagram of an industrial chain recognition device according to the present invention.
Detailed Description
In order to better understand the above technical solutions, the following description will refer to the drawings and specific embodiments.
The method provided by the invention can be implemented in a terminal environment, and the terminal can comprise one or more of the following components: processor, memory and display screen. Wherein the memory stores at least one instruction that is loaded and executed by the processor to implement the method described in the embodiments below.
The processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and invoking data stored in the memory.
The Memory may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). The memory may be used to store instructions, programs, code, sets of codes, or instructions.
The display screen is used for displaying a user interface of each application program.
In addition, it will be appreciated by those skilled in the art that the structure of the terminal described above is not limiting and that the terminal may include more or fewer components, or may combine certain components, or a different arrangement of components. For example, the terminal further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and the like, which are not described herein.
Example 1
As shown in fig. 1, a first aspect of the present invention provides an industrial chain identification method, including:
s101: an original data set is collected from a plurality of data sources, and data cleaning is carried out on the original data set.
In the embodiment of the invention, the data sources can be bid information and bid information issued by the target website, management files, industry hot spot information issued by each industry information source and enterprise information provided by suppliers, and also can be enterprise statistical information provided by each large organization, and the selection of the data sources can be determined according to actual conditions and are not described in detail herein.
Since the raw data set is collected from multiple data sources, in an embodiment of the present invention, the raw data set includes: bid information and bid winning information released by a target website, administrative files, industry hot spot information released by various industry information sources, enterprise information provided by suppliers and enterprise statistical information provided by various large organization.
S102: and performing data cleaning on the original data set.
Furthermore, because the information issued by the data source has the conditions of format, expression mode, repeated content and the like, in the embodiment of the invention, the collected original data set is required to be subjected to data cleaning, so that the original data set with uniform specification and meeting the data specification is obtained.
It should be noted that, in the embodiment of the present invention, the method of normalization, reduction and deduplication may be used to clean the collected original data set.
S103: and screening the cleaned data to obtain the data related to the enterprise.
Further, since the knowledge graph is constructed based on the enterprise-related data, for example, the name of the enterprise, the legal person of the enterprise, the funding line of the enterprise, the operation scale of the enterprise, the latest technology of the enterprise, the patent applied by the enterprise, etc., and the data included in the original data set after the cleaning in step S102 may be the data related to the enterprise or the data not related to the enterprise, in the embodiment of the present invention, the cleaned data needs to be screened to obtain the data related to the enterprise.
Therefore, the embodiment of the invention provides a specific implementation mode for screening the cleaned data to obtain the data related to enterprises, which comprises the following steps: and screening the cleaned data through a pre-established enterprise screening model based on feedforward neural network framework training to obtain data related to enterprises.
The enterprise screening model is a model trained based on a feedforward neural network framework, and can judge whether the input information is enterprise-related data, so that the accuracy of enterprise-related data screening is improved.
In the embodiment of the invention, the training flow of the enterprise screening model is as follows: firstly, an expert is handed over to carry out whether the acquired enterprise sample data are relevant to the enterprise or not, the data sample is rich enough, and the marking is accurate enough. Dividing the marked data into a training data set and a test data set, and then segmenting the information of the marked training data set to generate tensor data containing text information. And then, the newly acquired tensor data and the labeling information are transmitted into a written feedforward neural network model training frame to train out an enterprise related classification judgment model. And finally, evaluating the result of the enterprise relevant judgment model by using the test data set.
S104: and extracting target enterprise information and target inter-enterprise association information from the enterprise-related data.
Further, the embodiment of the present invention needs to extract the target enterprise information and the target enterprise association information from the enterprise-related data after obtaining the enterprise-related data in step S103.
The target enterprise information refers to enterprise names, enterprise management products, enterprise legal persons, patents applied by enterprises, latest technologies of enterprises, projects applied by enterprises, enterprise technical keywords, enterprise introduction and the like of a certain enterprise; the target enterprise association information refers to a provisioning relationship, a competition relationship, a cooperation relationship, a delegation relationship, a superior-inferior relationship, and the like between enterprises.
Further, in order to improve accuracy of enterprise information extraction, the embodiment of the invention extracts the target enterprise information and the target inter-enterprise association information from the enterprise-related data in a plurality of modes, and stores the information and the information in a knowledge graph library.
Based on the principle of information extraction, the embodiment of the invention provides a method for extracting target enterprise information and related information between target enterprises by using various modes, which comprises the following steps:
the first extraction mode is as follows: and extracting unstructured enterprise related data screened by the enterprise screening model through a pre-established information extraction model based on a bert named entity identification framework to obtain target enterprise information and target enterprise related information.
It should be noted that, the information extraction model of the bert-based named entity recognition framework includes: the information extraction model is used for extracting the information extraction model of the naming entity recognition framework based on the bert and used for extracting the association information between target enterprises. That is, an information extraction model based on the bert-based named entity recognition framework can be established for the target enterprise information and the target inter-enterprise association information, respectively.
In addition, in the embodiment of the invention, an information extraction model based on the bert-based named entity recognition framework can be established for the same type of enterprise information or inter-enterprise association information and used for extracting the type of enterprise information or inter-enterprise association information, so that different types of enterprise information or inter-enterprise information correspond to the information extraction model of the bert-based named entity recognition framework.
In the embodiment of the invention, the information extraction model of the named entity recognition framework based on the BERT can be an information extraction model based on the BERT-BiLSTM-CRF.
Taking the information extraction model of the inter-enterprise supply relation and the named entity recognition framework based on BERT in the inter-enterprise association information as an information extraction model based on BERT-BiLSTM-CRF as an example, establishing an information extraction model based on BERT-BiLSTM-CRF for extracting the inter-enterprise supply relation aiming at the inter-enterprise supply relation:
Based on the BERT-BiLSTM-CRF model framework, aiming at bid-winning information and bid-winning information issued by a target website, a model capable of automatically identifying detailed data such as a provider name, a buyer name, a bid-winning title, purchased goods and the like is trained, and is used as supporting data of a supply relationship among enterprises to enter a knowledge graph base.
In the process, the BertEmbeddding module is firstly applied to acquire high-quality word vectors from the input bid-winning information, then the newly acquired word vectors are input into a written BiLSTM model training framework, then the output result of the BiLSTM module is input into the CRF module to be decoded and generate a prediction labeling sequence, then each entity in the sequence is extracted and classified, the occurrence probability of each result is predicted, the result with the highest probability is output, and finally an identification model for supplying relation data is trained.
It should be noted that the information extraction model based on the BERT-BiLSTM-CRF established for the inter-enterprise supply relationship may be applied to other target enterprise information and the inter-target enterprise association information.
The second extraction method comprises the following steps: and cleaning the structured enterprise related data screened by the enterprise screening model through normalization, dimension reduction and duplication removal processes to obtain the target enterprise information and the target enterprise related information.
It should be noted that the target enterprise information may be a patent applied by an enterprise, an enterprise latest technology, an enterprise brief introduction, an enterprise technical keyword, and an enterprise product; the target inter-enterprise association information may be an enterprise supply relationship, an enterprise cooperation relationship, an enterprise competition relationship, an enterprise commission relationship, a goods circulation relationship, and the like.
It should be noted that the first extraction method and the second extraction method may be used in combination, or may be used alone, and specifically, which extraction method is used is determined according to the data structure of the selected enterprise-related data.
S105: and creating an enterprise knowledge graph according to the target enterprise information and the target enterprise association information.
Further, after the target enterprise information and the target inter-enterprise association information are extracted, the enterprise knowledge graph can be created according to the target enterprise information and the target inter-enterprise association information.
Therefore, the embodiment of the invention provides a concrete implementation mode for creating the enterprise knowledge graph according to the target enterprise information and the target enterprise association information:
integrating the target enterprise information and the target enterprise associated information; constructing a summary of the enterprise knowledge graph based on a resource description framework RDFS; and forming the target enterprise information and the related information between the target enterprises into triples according to the created summary of the enterprise knowledge graph, and storing the triples.
It should be noted that, table 1 is a summary example for constructing the enterprise knowledge graph based on the resource description framework.
TABLE 1
It should be noted that the established enterprise knowledge graph includes a plurality of enterprise nodes, the enterprise nodes represent enterprises, patents and standards related to the enterprises, and investment and financing are carried out; edges between enterprise nodes represent associations between enterprises such as provisioning relationships, competing relationships, and partnerships between enterprise nodes.
S106: and excavating an industry chain among enterprises in the enterprise knowledge graph.
Further, although the enterprise knowledge graph is recorded with the target enterprise information and the target enterprise association information, and the target enterprises and the enterprise information thereof are associated through the target enterprise association information to form a netlike association relationship, so as to form an association relationship between the enterprises which can be observed, but the association relationship between the two enterprises is likely to exist in the actual acquisition process according to the original data set acquired in the step S101, but the association relationship between the two enterprises is not acquired, so that in order to perfect the target enterprise association information in the enterprise knowledge graph, the constructed enterprise knowledge graph is richer, the industrial chain which belongs to the enterprise is more accurate for the subsequent identification, and the industrial chain between the enterprises in the enterprise knowledge graph is also required to be mined.
In the embodiment of the invention, the mining of the industry chain between enterprises in the enterprise knowledge graph can be specifically as follows:
determining, for each enterprise node within the enterprise knowledge graph, a shortest path from the enterprise node to each enterprise node having a connection path therewith; calculating the edge medium number of each undirected edge in the shortest path; for each undirected edge, adding and summing the edge betweenness calculated for all enterprise nodes corresponding to the undirected edge to obtain the total edge betweenness of the undirected edge; deleting the undirected edge with the largest total edge dielectric number; repeating the steps for the remaining enterprise nodes with connection paths until a plurality of enterprise clusters are formed.
It should be noted that, the algorithm can divide the enterprises in the enterprise knowledge graph into different communities according to the basic information and the association relation information of the enterprises, namely, different communities, namely, different enterprise clusters, and finally, each closely-connected enterprise cluster is reserved, each enterprise node in the enterprise cluster belongs to the same industry chain, and different enterprise clusters represent different types of industry chains.
In addition, because the enterprise knowledge graph comprises a plurality of enterprise nodes, the enterprise nodes represent enterprises, and patents, standards and financing related to the enterprises are provided; the edges between the enterprise nodes represent the association relations among enterprises such as the supply relations, the competition relations, the cooperation relations and the like among the enterprise nodes, the enterprise nodes with the association relations are connected together through undirected edges and have connection paths, so that in order to calculate the shortest path from the enterprise node to a certain enterprise node, the premise is that the enterprise node to the certain enterprise node is provided with the connection paths, the connection paths can be directly connected through undirected edges, for example, the enterprise node A and the enterprise node B are directly connected, or can be connected through other enterprise nodes, for example, the enterprise node A is connected with the enterprise node C, the enterprise node B is connected with the enterprise node C, and then the enterprise node A is indirectly connected with the enterprise node B through the enterprise node C.
Determining the shortest path from the enterprise node to each enterprise node having a connection path refers to determining each enterprise node directly connected or indirectly connected to the enterprise node, recording the enterprise node as a connection node, and determining the shortest path between the enterprise node and the connection node, where the shortest path refers to a path from the enterprise node to the connection node through the least undirected edge.
In addition, according to the definition of the network communities, the intra-community vertex connections are dense, and the inter-community connections are sparse. After deleting the undirected edge with the largest total edge betweenness, some enterprise nodes are not connected by undirected edges, and the association relationship is disconnected, which means that the number of channels (namely undirected edges) for connecting communities with each other is small, one community enters the other community at least through one of the channels, and the communities are separated by deleting the channels.
Repeating the process again for the enterprise knowledge graph, and continuously determining the shortest path from each enterprise node to each enterprise node with a connection path aiming at each enterprise node in the enterprise knowledge graph; calculating the edge medium number of each undirected edge in the shortest path; for each undirected edge, adding and summing the edge betweenness calculated for all enterprise nodes corresponding to the undirected edge to obtain the total edge betweenness of the undirected edge; and deleting the undirected edge with the largest total edge betweenness. And (3) until any vertex in the enterprise knowledge graph is used as a community, and the enterprise clusters are separated according to the contact compactness of the communities.
Through the step S106, the association information among enterprises is enriched and perfected in the original established enterprise knowledge graph, and an accurate data source is provided for the subsequent query step.
In addition, when the industrial chain among enterprises in the enterprise knowledge graph is mined, the method can be used, and the purposes of mining the industrial chain can be achieved by using a path analysis algorithm, a similarity calculation algorithm, a correlation rule mining algorithm and a graph neural network algorithm.
S107: and receiving query enterprise information input by a user.
S108: and identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information.
Further, after the enterprise knowledge graph is mined in a deeper level, query enterprise information input by a user is received, and an industry chain corresponding to the query enterprise information is identified in the enterprise knowledge graph according to the query enterprise information.
Here, the query enterprise information includes information such as a query enterprise name, an enterprise legal person, an address of an enterprise, an operation range of the enterprise, an operation product of the enterprise, and a technical keyword, which are not described in detail herein.
In addition, if an enterprise identical or similar to the queried enterprise is found in the enterprise knowledge graph, the industry chain to which the enterprise to be queried belongs is easily determined, so in the embodiment of the invention, according to the queried enterprise information, the industry chain corresponding to the queried enterprise information is identified in the enterprise knowledge graph, specifically, the method can be as follows:
for each enterprise node in the enterprise knowledge graph, calculating the similarity between enterprise information in the enterprise node and the query enterprise information; and taking the industry chain of the enterprise node with the highest similarity between the enterprise information and the query enterprise information as the industry chain of the query enterprise information.
It should be noted that, the similarity between the enterprise information in the enterprise node and the query enterprise information is calculated, and specifically the following method may be used:
initializing a token for the enterprise information and the query enterprise information; word segmentation is carried out on the enterprise information and the query enterprise information, and word indexes are established; generating the enterprise information and the position code of the query enterprise information according to the word index after word segmentation; converting the codes containing the context position information into tensor data to generate word vectors containing the context position information; through Euclidean distance formula Calculating the similarity between the enterprise information in the enterprise node and the query enterprise information, wherein D represents the similarity between the enterprise information in the enterprise node and the query enterprise information,/>is the X tensor at position i, +.>Is the Y tensor at the i position.
According to the method, the knowledge graph is constructed according to the target enterprise information and the target enterprise associated information, and the multi-dimensional information of the enterprise and various relations among the enterprises are comprehensively considered, so that a more perfect knowledge graph system with more complete information is constructed, and the industrial chain to which the enterprise belongs is judged more accurately finally due to the fact that the information is more and more complete in consideration, and the industrial chain confidence is high.
In addition, the method can rapidly locate the enterprise information transmitted by the user in the mode of the enterprise knowledge graph and the deep learning algorithm to find the enterprise cluster of the enterprise in the enterprise knowledge graph base, and rapidly judge the industry chain of the transmitted enterprise according to the industry chain of the enterprise cluster.
In the process, the method improves the judging efficiency and the judging accuracy of the industry chain of the enterprise by the following modes: (1) The enterprise screening model trained based on the feedforward neural network framework automatically performs first-layer screening on the collected information, and retains useful enterprise-related information. The process of manually screening information is replaced, and the efficiency of the whole flow is improved; (2) The information extraction model based on the bert named entity recognition framework is applied to find the detailed information (inter-enterprise relationship, enterprise products, technical keywords and the like) of enterprises from various information. The process of manually extracting information is replaced, the efficiency of the whole flow is improved, and the dependence of the method on expert experience is reduced; (3) And building an enterprise knowledge graph base by using the preprocessed enterprise detailed information, and mining the relationship among enterprises based on a knowledge graph algorithm. With the help of the rich and accurate enterprise knowledge graph library, the enterprise input by the user can be rapidly and accurately positioned, similar enterprise clusters can be found, and the industrial chain to which the enterprise clusters belong can be positioned. The efficiency of the whole process is further improved, and meanwhile, the accuracy is guaranteed with the help of the rich enterprise knowledge graph base.
Example two
Another aspect of the present invention further includes a functional module architecture that corresponds completely to the industrial chain identification method of the first embodiment, that is, an industrial chain identification device is provided, as shown in fig. 2, including: an acquisition module 201 for acquiring a raw data set from a plurality of data sources; a cleaning module 202, configured to perform data cleaning on the raw data set; the screening module 203 is configured to screen the cleaned data to obtain data related to the enterprise; an extracting module 204, configured to extract target enterprise information and target inter-enterprise association information from the enterprise-related data; a creating module 205, configured to create an enterprise knowledge graph according to the target enterprise information and the target enterprise association information; the mining module 206 is configured to mine an industry chain between enterprises in the enterprise knowledge graph; a receiving module 207, configured to receive query enterprise information input by a user; and the query module 208 is configured to identify, in the enterprise knowledge graph, an industry chain corresponding to the query enterprise information according to the query enterprise information.
The collection module 201 is specifically configured to collect bid information and bid information published by a target website, administrative files, industry hotspot information published by various industry information sources, and enterprise information provided by a provider.
The screening module 203 is specifically configured to screen the cleaned data through a pre-created enterprise screening model based on feedforward neural network framework training, so as to obtain data related to an enterprise.
The extracting module 204 is specifically configured to extract the target enterprise information and the target inter-enterprise association information from the enterprise-related data by applying multiple modes, and store the information and the information in a knowledge graph library.
The extraction module 204 is further configured to extract the selected unstructured enterprise related data through a pre-created information extraction model based on the bert-based named entity recognition framework, so as to obtain association information between the target enterprise information and the target enterprise; or cleaning the screened structured enterprise related data through normalization, reduction and de-duplication processes to obtain the target enterprise information and the target enterprise related information.
The creation module 205 is specifically configured to integrate the target enterprise information and the target enterprise association information; constructing a summary of the enterprise knowledge graph based on a resource description framework (Resource Description Framework Schema, RDFS); and forming the target enterprise information and the related information between the target enterprises into triples according to the created summary of the enterprise knowledge graph, and storing the triples.
The mining module 206 is specifically configured to determine, for each enterprise node in the enterprise knowledge graph, a shortest path from the enterprise node to each enterprise node having a connection path with the enterprise node; calculating the edge medium number of each undirected edge in the shortest path; for each undirected edge, adding and summing the edge betweenness calculated for all enterprise nodes corresponding to the undirected edge to obtain the total edge betweenness of the undirected edge; deleting the undirected edge with the largest total edge dielectric number; repeating the steps for the rest enterprise nodes with the connecting paths until a plurality of enterprise clusters are formed; each enterprise node in the enterprise cluster belongs to the same industry chain, and different enterprise clusters represent different types of industry chains.
The enterprise knowledge graph comprises: enterprise information;
the query module 208 is specifically configured to calculate, for each enterprise node in the enterprise knowledge graph, a similarity between enterprise information in the enterprise node and the query enterprise information; and taking the industry chain of the enterprise node with the highest similarity between the enterprise information and the query enterprise information as the industry chain of the query enterprise information.
The query module 208 is further configured to perform token initialization on the enterprise information and the query enterprise information; word segmentation is carried out on the enterprise information and the query enterprise information, and word indexes are established; generating the enterprise information and the position code of the query enterprise information according to the word index after word segmentation; converting the codes containing the context position information into tensor data to generate word vectors containing the context position information; through Euclidean distance formula Calculating the similarity between the enterprise information in the enterprise node and the query enterprise information, wherein D represents the similarity between the enterprise information in the enterprise node and the query enterprise information,/>is the X tensor at position i, +.>Is the Y tensor at the i position。
The device may be implemented by the industrial chain identification method provided in the first embodiment, and a specific implementation manner may be referred to the description in the first embodiment, which is not repeated herein.
Example III
The invention also provides an electronic device comprising a processor and a memory, the memory storing a plurality of instructions, the processor being configured to read the instructions and perform any of the methods of the previous embodiments. Wherein the processor and the memory may be connected by a bus or otherwise, for example by a bus connection. The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.
The memory is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present application. The processor executes various functional applications of the processor and data processing, i.e., implements the methods of the method embodiments described above, by running non-transitory software programs, instructions, and modules stored in memory.
The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory may optionally include memory located remotely from the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Example IV
The invention also provides an electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions loadable and executable by the processor to enable the processor to perform any one of the methods as in embodiment one. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (9)

1. An industrial chain identification method, comprising:
collecting an original data set from a plurality of data sources;
performing data cleaning on the original data set;
screening the cleaned data to obtain data related to enterprises;
extracting target enterprise information and target inter-enterprise association information from the enterprise-related data;
creating an enterprise knowledge graph according to the target enterprise information and the target enterprise association information;
digging an industry chain among enterprises in the enterprise knowledge graph;
receiving inquiry enterprise information input by a user;
identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information;
the extracting the target enterprise information and the target inter-enterprise association information from the enterprise-related data by using a plurality of modes comprises:
extracting the selected unstructured enterprise related data through a pre-established information extraction model based on a bert named entity recognition framework to obtain target enterprise information and target enterprise related information; or (b)
Washing the screened structured enterprise related data through normalization, dimension reduction and duplication removal processes to obtain target enterprise information and target enterprise related information;
The information extraction model based on the BERT-based named entity recognition framework is based on BERT-BiLSTM-CRF; an information extraction model based on BERT-BiLSTM-CRF for extracting the provisioning relations between enterprises is established by the following process:
based on the BERT-BiLSTM-CRF model framework, training a model for automatically identifying the name of a provider, the name of a buyer, the title of a bid and/or the purchased goods according to bid-winning information and bid-winning information, and taking the model as supporting data of the supply relationship between enterprises to enter a knowledge graph base; firstly, a BertEmbeddings module is applied to acquire high-quality word vectors from the input bid-winning information, then the newly acquired word vectors are transmitted to a pre-written BiLSTM model training frame, the output result of the BiLSTM module is transmitted to a CRF module to be decoded and generate a prediction labeling sequence, each entity in the sequence is extracted and classified, the occurrence probability of each result is predicted, the result with the maximum probability is output, and finally an identification model for supplying relation data is trained;
the enterprise knowledge graph comprises: enterprise information;
identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information, wherein the industry chain comprises the following steps:
For each enterprise node in the enterprise knowledge graph, calculating the similarity between enterprise information in the enterprise node and the query enterprise information;
taking an industry chain to which the enterprise node with the highest similarity between the enterprise information and the query enterprise information belongs as the industry chain for querying the enterprise information;
the calculating the similarity between the enterprise information in the enterprise node and the query enterprise information includes:
carrying out token initialization on the enterprise information and the query enterprise information;
word segmentation is carried out on the enterprise information and the query enterprise information, and word indexes are established;
generating the enterprise information and the position code of the query enterprise information according to the word index after word segmentation;
converting the codes containing the context position information into tensor data to generate word vectors containing the context position information;
through Euclidean distance formulaCalculating the similarity between the enterprise information in the enterprise node and the query enterprise information, wherein D represents the similarity between the enterprise information in the enterprise node and the query enterprise information, and +_>Is the X tensor at position i, +.>Is the Y tensor at the i position.
2. The method of claim 1, wherein collecting the raw data set from a plurality of data sources comprises:
and collecting bid information and bid winning information released by the target website, management files, industry hotspot information released by various industry information sources and enterprise information provided by suppliers.
3. The method of claim 1, wherein screening the cleaned data to obtain enterprise-related data comprises:
and screening the cleaned data through a pre-established enterprise screening model based on feedforward neural network framework training to obtain data related to enterprises.
4. The method of claim 1, wherein extracting target business information and target inter-business association information from the business-related data comprises:
and extracting target enterprise information and target enterprise association information from the enterprise-related data in a plurality of modes, and storing the information and the target enterprise association information in a knowledge graph base.
5. The method of claim 1, wherein creating an enterprise knowledge-graph based on the target enterprise information and target inter-enterprise association information comprises:
integrating the target enterprise information and the target enterprise associated information;
Constructing a summary of the enterprise knowledge graph based on a resource description framework RDFS;
and forming the target enterprise information and the related information between the target enterprises into triples according to the created summary of the enterprise knowledge graph, and storing the triples.
6. The method of claim 1, wherein mining the industry chain between businesses within the business knowledge graph comprises:
determining, for each enterprise node within the enterprise knowledge graph, a shortest path from the enterprise node to each enterprise node having a connection path therewith;
calculating the edge medium number of each undirected edge in the shortest path;
for each undirected edge, adding and summing the edge betweenness calculated for all enterprise nodes corresponding to the undirected edge to obtain the total edge betweenness of the undirected edge;
deleting the undirected edge with the largest total edge dielectric number;
repeating the step of mining the industrial chain among enterprises in the enterprise knowledge graph for the rest enterprise nodes with the connecting paths until a plurality of enterprise clusters are formed;
each enterprise node in the enterprise cluster belongs to the same industry chain, and different enterprise clusters represent different types of industry chains.
7. An industrial chain identification device, comprising:
The acquisition module is used for acquiring an original data set from a plurality of data sources;
the cleaning module is used for cleaning the data of the original data set;
the screening module is used for screening the cleaned data to obtain data related to enterprises;
the extraction module is used for extracting target enterprise information and target inter-enterprise association information from the enterprise-related data;
the creating module is used for creating an enterprise knowledge graph according to the target enterprise information and the target enterprise association information;
the mining module is used for mining industry chains among enterprises in the enterprise knowledge graph;
the receiving module is used for receiving inquiry enterprise information input by a user;
the query module is used for identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information;
the extracting the target enterprise information and the target inter-enterprise association information from the enterprise-related data by using a plurality of modes comprises:
extracting the selected unstructured enterprise related data through a pre-established information extraction model based on a bert named entity recognition framework to obtain target enterprise information and target enterprise related information; or (b)
Washing the screened structured enterprise related data through normalization, dimension reduction and duplication removal processes to obtain target enterprise information and target enterprise related information;
the information extraction model based on the BERT-based named entity recognition framework is based on BERT-BiLSTM-CRF; an information extraction model based on BERT-BiLSTM-CRF for extracting the provisioning relations between enterprises is established by the following process:
based on the BERT-BiLSTM-CRF model framework, training a model for automatically identifying the name of a provider, the name of a buyer, the title of a bid and/or the purchased goods according to bid-winning information and bid-winning information, and taking the model as supporting data of the supply relationship between enterprises to enter a knowledge graph base; firstly, a BertEmbeddings module is applied to acquire high-quality word vectors from the input bid-winning information, then the newly acquired word vectors are transmitted to a pre-written BiLSTM model training frame, the output result of the BiLSTM module is transmitted to a CRF module to be decoded and generate a prediction labeling sequence, each entity in the sequence is extracted and classified, the occurrence probability of each result is predicted, the result with the maximum probability is output, and finally an identification model for supplying relation data is trained;
The enterprise knowledge graph comprises: enterprise information;
identifying an industry chain corresponding to the query enterprise information in the enterprise knowledge graph according to the query enterprise information, wherein the industry chain comprises the following steps:
for each enterprise node in the enterprise knowledge graph, calculating the similarity between enterprise information in the enterprise node and the query enterprise information;
taking an industry chain to which the enterprise node with the highest similarity between the enterprise information and the query enterprise information belongs as the industry chain for querying the enterprise information;
the calculating the similarity between the enterprise information in the enterprise node and the query enterprise information includes:
carrying out token initialization on the enterprise information and the query enterprise information;
word segmentation is carried out on the enterprise information and the query enterprise information, and word indexes are established;
generating the enterprise information and the position code of the query enterprise information according to the word index after word segmentation;
converting the codes containing the context position information into tensor data to generate word vectors containing the context position information;
through Euclidean distance formulaCalculating the similarity between the enterprise information in the enterprise node and the query enterprise information, wherein D represents the similarity between the enterprise information in the enterprise node and the query enterprise information, and +_ >Is the X tensor at position i, +.>Is the Y tensor at the i position.
8. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions, the processor configured to read the instructions and perform the industrial chain identification method of any one of claims 1-6.
9. A computer-readable storage medium storing a plurality of instructions readable by a processor and performing the industrial chain identification method according to any one of claims 1 to 6.
CN202311152477.7A 2023-09-07 2023-09-07 Industrial chain identification method and device, electronic equipment and readable storage medium Active CN116881430B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311152477.7A CN116881430B (en) 2023-09-07 2023-09-07 Industrial chain identification method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311152477.7A CN116881430B (en) 2023-09-07 2023-09-07 Industrial chain identification method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN116881430A CN116881430A (en) 2023-10-13
CN116881430B true CN116881430B (en) 2023-12-12

Family

ID=88272205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311152477.7A Active CN116881430B (en) 2023-09-07 2023-09-07 Industrial chain identification method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116881430B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435777A (en) * 2023-12-20 2024-01-23 烟台云朵软件有限公司 Automatic construction method and system for industrial chain map

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376371A (en) * 2018-02-02 2018-08-07 众安信息技术服务有限公司 A kind of internet insurance marketing method and system based on social networks
CN112183747A (en) * 2020-09-29 2021-01-05 华为技术有限公司 Neural network training method, neural network compression method and related equipment
CN113361542A (en) * 2021-06-02 2021-09-07 合肥工业大学 Local feature extraction method based on deep learning
CN113505242A (en) * 2021-07-16 2021-10-15 珍岛信息技术(上海)股份有限公司 Method and system for automatically embedding knowledge graph
CN114428864A (en) * 2022-04-01 2022-05-03 杭州未名信科科技有限公司 Knowledge graph construction method and device, electronic equipment and medium
CN116206306A (en) * 2022-12-26 2023-06-02 山东科技大学 Inter-category characterization contrast driven graph roll point cloud semantic annotation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220019740A1 (en) * 2020-07-20 2022-01-20 Microsoft Technology Licensing, Llc Enterprise knowledge graphs using enterprise named entity recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376371A (en) * 2018-02-02 2018-08-07 众安信息技术服务有限公司 A kind of internet insurance marketing method and system based on social networks
CN112183747A (en) * 2020-09-29 2021-01-05 华为技术有限公司 Neural network training method, neural network compression method and related equipment
CN113361542A (en) * 2021-06-02 2021-09-07 合肥工业大学 Local feature extraction method based on deep learning
CN113505242A (en) * 2021-07-16 2021-10-15 珍岛信息技术(上海)股份有限公司 Method and system for automatically embedding knowledge graph
CN114428864A (en) * 2022-04-01 2022-05-03 杭州未名信科科技有限公司 Knowledge graph construction method and device, electronic equipment and medium
CN116206306A (en) * 2022-12-26 2023-06-02 山东科技大学 Inter-category characterization contrast driven graph roll point cloud semantic annotation method

Also Published As

Publication number Publication date
CN116881430A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN108038183B (en) Structured entity recording method, device, server and storage medium
WO2019214245A1 (en) Information pushing method and apparatus, and terminal device and storage medium
CN111612039B (en) Abnormal user identification method and device, storage medium and electronic equipment
CN107657267B (en) Product potential user mining method and device
CN104573130B (en) The entity resolution method and device calculated based on colony
CN108021651B (en) Network public opinion risk assessment method and device
CN108027814B (en) Stop word recognition method and device
CN113610239A (en) Feature processing method and feature processing system for machine learning
CN112163424A (en) Data labeling method, device, equipment and medium
CN116881430B (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN109800354B (en) Resume modification intention identification method and system based on block chain storage
CN115098556A (en) User demand matching method and device, electronic equipment and storage medium
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN106980639B (en) Short text data aggregation system and method
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN110147482B (en) Method and device for acquiring burst hotspot theme
CN110674290B (en) Relationship prediction method, device and storage medium for overlapping community discovery
CN113157978A (en) Data label establishing method and device
CN105468658B (en) Data cleaning method and device
CN109144999B (en) Data positioning method, device, storage medium and program product
CN114090601B (en) Data screening method, device, equipment and storage medium
CN116226526A (en) Intellectual property intelligent retrieval platform and method
CN116260866A (en) Government information pushing method and device based on machine learning and computer equipment
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN110737749B (en) Entrepreneurship plan evaluation method, entrepreneurship plan evaluation device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant