CN116842406A

CN116842406A - Industrial chain network construction method, storage medium and system based on bidding information

Info

Publication number: CN116842406A
Application number: CN202310798804.XA
Authority: CN
Inventors: 赵永国; 曹熙; 蔡露; 程菊花; 余建纯; 李文杰; 曾祥清; 胡彩倩; 倪沛权; 王雪纯; 韩庭钰; 戴渝卓
Original assignee: China Southern Power Grid Big Data Service Co ltd
Current assignee: China Southern Power Grid Big Data Service Co ltd
Priority date: 2023-06-30
Filing date: 2023-06-30
Publication date: 2023-10-03

Abstract

The invention provides an industrial chain network construction method, a storage medium and a system based on bidding information, wherein the method comprises the following steps: collecting a plurality of pieces of enterprise information, wherein each piece of enterprise information comprises an enterprise name, operation range information and bidding information; constructing an industry chain network according to a plurality of pieces of enterprise information; identifying a core node in the industry chain network; identifying enterprise features corresponding to each node in the industry chain network; the core node and enterprise characteristics of each node are visually updated in the industry chain network. Therefore, based on the industrial chain network, not only the industry attribution of the enterprise corresponding to each node and the position of the enterprise in the industrial chain can be determined, but also the core enterprise among all enterprises of the industrial chain can be determined based on the core node, and the main business information of all enterprises of the industrial chain can be determined based on the enterprise characteristics of each node, so that the industrial chain network is better known in depth from the outside, and effective support can be provided for external decision-making.

Description

Industrial chain network construction method, storage medium and system based on bidding information

Technical Field

The invention relates to the technical field of data processing, in particular to an industrial chain network construction method, a storage medium and a system based on bidding information.

Background

The industry chain refers to a collaboration and cooperation system formed by combining enterprises such as raw material suppliers, processing manufacturers, final product manufacturers, sellers, service providers and the like in a certain order and manner. With the rise of new generation information technology, most industries are converted into digitization and networking, so that the form of an industrial chain becomes more diversified and complicated, and the coupling degree among all links in the industrial chain becomes more compact. Deep knowledge of the internal structure of the industry chain network and potential business opportunities is required for governments, businesses and investors to develop scientific policies, implement effective strategies, and make targeted investments.

Bidding is one of the main ways for enterprises to obtain business opportunities and conduct trade, and the bidding information contains a great deal of enterprise information and relations related to industries and industry chains where the enterprises are located. Thus, mining and analyzing bidding information can build an industry chain network between enterprises, helping governments, enterprises and investors to gain insight into the internal structure and potential business opportunities of the industry chain network. The industry chain network constructed by using the bidding information of the enterprise at present can be generally used for determining the industry attribution of the enterprise and the position of the enterprise in the industry chain, and the government, the enterprise and investors can be helped to deeply understand the internal structure and the potential business opportunities of the industry chain network, but the deep understanding degree is obviously insufficient for the industry chain with more diversification, complexity and tighter coupling degree at present, and effective support is difficult to be provided for the decisions of the government, the enterprise and the investors.

Disclosure of Invention

The technical problem to be solved by the invention is how to improve the degree of deep knowledge of an industry chain network constructed by utilizing bidding information.

In order to solve the technical problems, the invention provides an industrial chain network construction method based on bidding information, which comprises the following steps:

A. collecting a plurality of pieces of enterprise information, wherein each piece of enterprise information comprises an enterprise name, operation range information and bidding information;

B. constructing an industry chain network according to the multiple pieces of enterprise information, wherein nodes of the industry chain network are generated according to enterprise names in each piece of enterprise information, and edges between the nodes are generated according to bidding information in each piece of enterprise information;

C. identifying a core node in the industrial chain network, which specifically comprises the following steps of C1, C2 and C3;

C1. calculating a plurality of centrality metrics for each node in the industry chain network, wherein the plurality of centrality metrics comprises: a degree centrality index representing the number of connections of a node in an industrial chain network, a medium centrality index representing the number of times the node is used as an intermediate node in the industrial chain network, a near centrality index representing the average value of distances from the node to all other nodes, and a cluster coefficient index representing the degree of interconnection between all neighboring nodes of the node;

C2. Giving different weights to each centrality measurement index, and carrying out weighted calculation on each centrality measurement index according to the weights to obtain the core comprehensive score of each node;

C3. calculating average scores of a plurality of core comprehensive scores arranged in front of a preset ranking, and selecting nodes with the core comprehensive scores not lower than the average scores as core nodes in an industrial chain network;

D. identifying enterprise characteristics corresponding to each node in the industrial chain network, wherein the enterprise characteristics specifically comprise the following steps D1, D2 and D3;

D1. performing word segmentation operation on the operation range information of each node in the industrial chain network to obtain a plurality of words;

D2. calculating word frequency inverse document frequency of each word, and extracting a plurality of words with word frequency inverse document frequency arranged in front of a preset ranking as operation range keywords of the nodes;

D3. classifying keywords in each operation range of the nodes by using the trained classification model to obtain enterprise characteristics of each node;

E. and visually updating the core node and the enterprise characteristics of each node in the industrial chain network.

Preferably, in the step a, the bidding information includes a bidding project name related to the enterprise, a bid unit in the bidding project, and a winning unit; in the step B, when the side between the nodes is generated based on the bidding information in each item of enterprise information, the direction of the side between the nodes is determined based on the relationship between the bidding item, the bidding unit, and the winning unit in the bidding information.

Preferably, in the step a, the number of transactions between enterprises is obtained according to the number of different bidding projects including the same bidding unit and winning unit in the bidding information; in the step B, the bidding project weight among the nodes is calculated according to the transaction times among enterprises, and the thickness of the edges among the nodes is defined according to the calculated bidding project weight.

Preferably, in the step C1:

the calculation formula of the centering index isWherein DC (i) represents a degree centrality index of the node i, deg (i) represents the connection quantity of the node i with other nodes in the industrial chain network, and n represents the node number in the industrial chain network;

the calculation formula of the medium number centrality index isWherein BC (i) represents the medium centrality index of node i, V represents the set of all nodes in the industrial chain network, j and k are respectively two different nodes in the set V, and sigma _jk Representing the number of shortest paths between node j and node k, σ _jk (i) The number of the intermediate nodes of the node i in the shortest path between the node j and the node k is represented;

the calculation formula of the index approaching to the centrality is as followsWherein AC (i) represents a proximity centrality index of node i, V represents a set of all nodes in the industry chain network, j is other nodes in the set V than node i, d (j, i) represents a distance from node i to node j, and n represents a node number in the industry chain network;

The calculation formula of the cluster coefficient index isWherein CC (i) represents a cluster coefficient index of node iK (i) represents the number of connections between node i and its neighboring nodes, and t (i) represents the number of edges between all neighboring nodes of node i.

Preferably, in the step C2, the calculation formula of the core composite score is: SC (i) =w ₁ *DC(i)+w ₂ *BC(i)+w ₃ *AC(i)+w ₄ * CC (i); wherein SC (i) represents the core composite score of the node i, DC (i) represents the degree centrality index of the node i, BC (i) represents the medium centrality index of the node i, AC (i) represents the near centrality index of the node i, CC (i) represents the clustering coefficient index of the node i, and w ₁ Weight, w, representing a centrality index of node i ₂ Weights representing the median centrality index of node i, w ₃ Weight representing approximate centrality index of node i, w ₄ The weight of the cluster coefficient index representing the node i.

Preferably, in the step D2, a calculation formula of the word frequency inverse document frequency of each word is: TF-idf=tf=tf log (N/(df+1)); the TF-IDF represents word frequency inverse document frequency of words, TF represents word frequency of words, IDF represents inverse document frequency of words, N represents total operation range information of all nodes in the industry chain network, and DF represents the number of words containing required word frequency inverse document frequency in operation range information of all nodes in the industry chain network.

Preferably, in the step D3, training the classification model includes the following operations: and constructing a training sample set and a test sample set, inputting the training sample set into the classification model for training, testing the trained classification model by using the test sample set, and correcting parameters of the classification model according to test results.

Preferably, in the step E, the enterprise characteristics of the core node and each node are visually updated in the industry chain network, specifically: the node size of the core nodes in the industry chain network is enlarged, and the enterprise characteristics of each node are directly added to each node of the industry chain network.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the industrial chain network construction method as described above.

The invention also provides an industrial chain network construction system based on bidding information, which comprises a processor and the computer readable storage medium.

The invention has the following beneficial effects: after the industry chain network is constructed according to the enterprise information containing bidding information, the core nodes in the industry chain network are identified, enterprise characteristics corresponding to each node in the industry chain network are identified, and then the core nodes and the enterprise characteristics of each node are visually updated in the industry chain network, so that the core enterprises in all enterprises of the industry chain can be determined based on the core nodes besides the industry attribution of the enterprises corresponding to each node and the positions of the enterprises in the industry chain, and the hosting service information of all enterprises of the industry chain can be determined based on the enterprise characteristics of each node, which is beneficial to the deep understanding of the industry chain network by the outside and can provide effective support for external decisions.

Drawings

Fig. 1 is a flow chart of an industrial chain network construction method based on bidding information.

Fig. 2 is a schematic diagram of the output results of TF value, IDF value, and TF-IDF value for each word in the operation range information of the node O.

Detailed Description

The invention is further described in detail below in connection with the detailed description.

The present embodiment provides an industry chain network construction system based on bidding information, the system including a computer-readable storage medium and a processor connected to each other, the computer-readable storage medium having stored thereon a computer program which, when executed by the processor, implements an industry chain network construction method based on bidding information as shown in fig. 1, the method including the following step A, B, C, D, E.

A. A plurality of pieces of business information are collected, wherein each piece of business information includes a business name, business scope information, and bidding information.

In this embodiment, multiple pieces of enterprise information are acquired from an enterprise information query platform by utilizing a scratch or beauful Soup web crawler technology, where each piece of enterprise information includes an enterprise name, operation range information, industry information to which the enterprise belongs, and bidding information, and the bidding information includes a bidding project name, a bidding unit in the bidding project, and a winning unit related to the enterprise. In addition, for a bidding project related to a certain enterprise, the enterprise is a bidding unit or winning unit in the bidding project.

Then, the system cleans and normalizes the collected enterprise information, and specifically performs the following operations: (1) Removing useless information such as page tags, redundant spaces and the like, and extracting the rest information; (2) Converting the acquired industry information of the enterprise according to national standard codes to obtain national standard information of the industry of the enterprise so as to ensure standardization of the information; (3) And carrying out duplication removal and screening on the information, and ensuring the quality and accuracy of the information.

After the collected enterprise information is cleaned and normalized, the system obtains the transaction times among enterprises according to the number of different bidding projects of the same bidding units and the same bidding units, and can distribute the side weight among enterprises according to the transaction times, thereby playing an important role in subsequent analysis. For example, three bidding projects related to an A enterprise are respectively a first bidding project, a second bidding project and a third bidding project, wherein a bidding unit of the first bidding project is the A enterprise, a bidding unit of the second bidding project is the B enterprise, a bidding unit of the second bidding project is the C enterprise, a bidding unit of the A enterprise, a bidding unit of the third bidding project is the A enterprise, and a bidding unit of the B enterprise; thus, according to bidding information of the first enterprise, two bidding projects (specifically, bidding projects one and three) of the first enterprise and the second enterprise are adopted as bidding units, the number of transactions between the first enterprise and the second enterprise is 2, and one bidding project (specifically, bidding project two) of the third enterprise and the first enterprise is adopted as bidding units, and the number of transactions between the third enterprise and the first enterprise is 1.

Then, the system stores the cleaned and normalized enterprise information and the number of transactions between enterprises in a MySQL database, and the finally stored enterprise information comprises enterprise names, operation scope information, industry national standard information of the enterprises, bid item names, bid units and transaction numbers.

B. And constructing an industry chain network according to the plurality of pieces of enterprise information, wherein nodes of the industry chain network are generated according to enterprise names in each piece of enterprise information, and edges between the nodes are generated according to bidding information in each piece of enterprise information.

When the industrial chain network is constructed, the system firstly imports a plurality of pieces of enterprise information stored in the MySQL database, and realizes the construction of the industrial chain network by utilizing the enterprise information, and the system specifically comprises the following five steps of:

in the first step, the system imports a plurality of pieces of enterprise information stored in the MySQL database, wherein each piece of enterprise information comprises an enterprise name, operation range information, industry national standard information of the enterprise, bidding project names, bidding units, winning units and transaction times.

And secondly, taking the enterprise name in each item of enterprise information as a unique identifier, and generating nodes of the industrial chain network according to the enterprise name in the enterprise information, so that each node corresponds to one enterprise, and the uniqueness and consistency of the nodes are ensured.

In the third step, the side between the nodes is generated according to the bidding information in each item of the enterprise information, specifically, since the bidding information includes the bidding item name related to the enterprise, the bidding unit in the bidding item and the winning unit in the bidding item, an side can be generated according to the bidding information between the enterprise node corresponding to the bidding unit in the bidding item and the enterprise node corresponding to the winning unit node, for example, in the bidding item one and the bidding item three above, the bidding unit is the first enterprise, the winning unit is the second enterprise, an side is generated between the first enterprise node and the second enterprise node, and in the bidding item two, the bidding unit is the third enterprise, and an side is generated between the third enterprise node and the first enterprise node.

And a fourth step of determining the direction of the edge between the nodes according to the relation of the bidding project, the bidding unit and the winning unit in the bidding information, wherein the direction of the edge is specifically that the node corresponding to the winning unit points to the node corresponding to the bidding unit, for example, in the bidding project one and the bidding project three above, the bidding unit is an first enterprise, the winning unit is an second enterprise, the direction of the edge between the first enterprise node and the second enterprise node is that the second enterprise node points to the first enterprise node, and in the bidding project two, the bidding unit is a third enterprise, the direction of the edge between the third enterprise node and the first enterprise node is that the first enterprise node points to the third enterprise node.

Fifthly, calculating bidding project weights among nodes according to the transaction times among enterprises, defining the thicknesses of edges among the nodes according to the calculated bidding project weights, wherein the thicknesses of the edges reflect the relation importance among the nodes, and specifically, a MinMax standardization method is adopted to calculate weight values, and a calculation formula is specifically as follows:

w＝(t-minT)/(maxT-minT)；

where w is the weight of the bidding projects between the enterprise nodes, t is the number of transactions of the bidding projects between the current nodes, minT is the minimum number of transactions of the bidding projects between the current nodes in the industry, and maxT is the maximum number of transactions of the bidding projects between the current nodes in the industry. Wherein t is more than or equal to minT, maxT is more than or equal to t, maxT is more than or equal to minT, so t-minT is more than or equal to 0, maxT-minT is more than 0, and w is more than or equal to 0; and maxT-minT is more than or equal to t-minT, so that w is less than or equal to 0, the value range of bidding project weight w between nodes is between [0,1], and the larger the value of weight w is, the thicker the edges between the nodes are.

Thus, an industrial chain network is constructed through the five-step flow, in the industrial chain network, nodes represent enterprises, edges represent relationships among the enterprises, and thickness of the edges represents relationship importance among the enterprises.

C. Core nodes in the industry chain network are identified.

To gain a better understanding of the industry chain network, the present embodiment identifies core nodes in the industry chain network based on graph theory model after constructing the industry chain network according to a plurality of pieces of enterprise information. Specifically, the system acquires the number of nodes in the industrial chain network and the connection number between the nodes according to the constructed industrial chain network, calculates a plurality of centrality measurement indexes of the nodes commonly used by the graph theory model, comprehensively calculates a core comprehensive score of each node according to the plurality of centrality measurement indexes of the nodes to determine the importance degree of the core comprehensive score, and selects the node with higher importance degree as the core node in the industrial chain network, wherein the method specifically comprises the following steps C1, C2 and C3.

C1. For each node in the industry chain network, calculating a plurality of centrality metrics, wherein the plurality of centrality metrics comprises: a centrality index representing the number of connections of a node in an industrial chain network, a median centrality index representing the number of times the node is an intermediate node in the industrial chain network, a near centrality index representing the average value of distances from the node to all other nodes, and a cluster coefficient index representing the degree of interconnection between all neighboring nodes of the node.

The core nodes in the industrial chain network are identified based on the graph theory model, and the importance of the nodes needs to be evaluated according to a plurality of centrality measurement indexes of each node, so that a plurality of centrality measurement indexes need to be calculated for each node in the industrial chain network, and the commonly used centrality measurement indexes comprise the following four types:

(1) The centrality index (Degree Centrality) represents the connection quantity of the node and other nodes in the whole industrial chain network, and the calculation formula is specifically as follows:

wherein DC (i) represents a centrality index of the node i; deg (i) represents the number of connections of node i with other nodes in the industry chain network; n represents the number of nodes in the industry chain network. The higher the value of the centrality index, the more other nodes connected with the node i in the whole industrial chain network are, which means that the node i has more information resources and has more productivity.

(2) A median center index (Betweenness Centrality) which represents a median center index of the number of times a node acts as an intermediate node in an industrial chain network, that is, the number of times the node acts as an intermediate role in the industrial chain network is passed, and is used for evaluating the degree of intermediation of the node acting as information transmission in the industrial chain network, wherein the calculation formula is as follows:

Wherein BC (i) represents the medium centrality index of node i; v represents a set of all nodes in the industrial chain network, and j and k represent two different nodes in the set V respectively; sigma (sigma) _jk Representing the number of shortest paths between node j and node k; sigma (sigma) _jk (i) The number of intermediate nodes is represented by node i in the shortest path between node j and node k. The higher the betweenness center index is, the higher the node is used as the betweenness degree of information transmission in the industry chain network, the key transition zone is occupied, and the node has important position.

(3) A proximity centrality index (Closeness Centrality) representing the average value of the distances from the node to all other nodes, reflecting the degree of closeness of the node to other nodes, the calculation formula being specifically as follows:

wherein AC (i) represents a proximity centrality index of node i; v represents a set of all nodes in the industrial chain network, and j represents other nodes different from the node i in the set V; d (j, i) represents the distance from node i to node j; n represents the number of nodes in the industry chain network. The higher the proximity centrality index, the closer the average distance of the node i to all other nodes, meaning that the closer the link between the node i and other nodes.

(4) A cluster coefficient index (Clustering Coefficient) which represents the degree of interconnection between all neighboring nodes of the node, and the calculation formula is specifically as follows:

wherein CC (i) represents a cluster coefficient index of the node i; k (i) represents the degree of the node i, namely the number of connections between the node i and the neighbor nodes thereof, and reflects how many neighbor nodes the node i has; t (i) represents the number of edges between all the neighboring nodes of the node i, and reflects the degree of interconnection between all the neighboring nodes of the node i.

C2. And respectively giving different weights to each centrality measurement index, and carrying out weighted calculation on each centrality measurement index according to the weights to obtain the core comprehensive score of each node.

After the center metric index DC (i), the intermediate center index BC (i), the approximate center index AC (i) and the clustering coefficient index CC (i) of the node are calculated, the system respectively endows different weights to each center metric index, specifically, the weight w of the center metric index DC (i) ₁ A weight w of 0.4 for the median center index BC (i) ₂ A weight w of 0.3, approaching the centrality index AC (i) ₃ Weight w of cluster coefficient index CC (i) of 0.2 ₄ And (3) for 0.1, then weighting and calculating each centrality measurement index according to the weights to obtain a core comprehensive score of each node, wherein the calculation formula is as follows:

SC(i)＝w ₁ *DC(i)+w ₂ *BC(i)+w ₃ *AC(i)+w ₄ *CC(i)；

Wherein SC (i) represents the core composite score of node i; DC (i) represents a degree-centering index of the node i; BC (i) represents a medium centrality index of node i; AC (i) represents a proximity centrality index of node i; CC (i) represents a cluster coefficient index of the node i; w (w) ₁ A weight representing a degree-centrality index of node i; w (w) ₂ A weight representing a median centrality index of node i; w (w) ₃ A weight representing a near centrality index for node i; w (w) ₄ The weight of the cluster coefficient index representing the node i. The larger the value of the core composite score SC (i), the descriptionThe higher the importance of the enterprise corresponding to node i in the industry chain network, the more likely the enterprise is a core enterprise, meaning that node i is more likely to be a core node.

C3. Calculating average scores of a plurality of core comprehensive scores arranged in front of a preset ranking, and selecting nodes with the core comprehensive scores not lower than the average scores as core nodes in an industrial chain network.

After calculating the core composite score of each node in the production chain network, a plurality of nodes with the core composite score arranged in front of a preset ranking may be generally defined as core nodes, for example, the first five 5 nodes with the core composite score are defined as core nodes, but in practical situations, a phenomenon of core composite score layers may occur, for example, the core composite score of the first node with the core composite score is particularly high, the core composite score of the first node is very different from that of other nodes, in which case only the first node should be defined as core node, and the first five 5 nodes with the core composite score should not be defined as core nodes. Therefore, in this embodiment, the core nodes are screened out by setting a threshold, where the setting mode of the threshold is based on the core comprehensive scores of all the nodes, selecting a plurality of core comprehensive scores arranged in front of a preset ranking to calculate average score, using the average score as the threshold, for example, selecting the core comprehensive scores of the ten top 10 nodes to calculate average score, using the average score as the threshold, and screening out nodes with the core comprehensive scores not lower than the threshold as the core nodes in the industrial chain network. The specific operation is as follows:

(1) Sequencing all nodes from high to low according to the core comprehensive score to obtain a sequence:

SC(1)≥SC(2)≥SC(3)≥…≥SC(n)；

where n represents the number of nodes in the industry chain network, SC (1) represents the highest ranked core composite score, SC (2) represents the second highest ranked core composite score, SC (3) represents the third highest ranked core composite score, …, and SC (n) represents the lowest ranked core composite score.

(2) The threshold T for screening out the core node is determined by the following specific formula:

wherein T represents a threshold value for screening out core nodes, m represents a preset ranking, and SC (m) represents a core comprehensive score with ranking of m; in this embodiment, the specific value of m is 10.

(3) Dividing the core comprehensive scores of all nodes into a qualified sample set and a disqualified sample set according to a threshold T (i.e. average score of a plurality of core syndromes arranged in front of a preset ranking), and then selecting a node with the core comprehensive score not lower than the threshold T (i.e. not lower than the average score) as a core node in an industrial chain network, specifically: the qualified sample set is { iSC (i) > T }, and the unqualified sample set is { iSC (i) < T }; where SC (i) represents the core composite score of node i. In this way, the core composite scores in the qualified sample set are not lower than the threshold T (i.e., not lower than the average score of a plurality of core composites arranged in front of the preset ranking), so that the embodiment selects the nodes corresponding to all the core composite scores in the qualified sample set as the core nodes in the industrial chain network, thus preventing the occurrence of the situation that the core composite scores of all the core nodes are greatly different due to the core composite score layer, and improving the stability and accuracy of the core node identification.

D. And identifying enterprise characteristics corresponding to each node in the industrial chain network.

In order to get more deep knowledge of the industry chain network, the present embodiment identifies enterprise features corresponding to each node in the industry chain network after building the industry chain network according to a plurality of pieces of enterprise information. Specifically, the system acquires business scope information in enterprise information of each node in the industry chain network according to the constructed industry chain network, then calculates word frequency inverse document frequency (namely TF-IDF value) of the business scope information of each node in the industry chain network by utilizing TF-IDF algorithm, so as to analyze main products and main services of enterprises corresponding to each node, and identify enterprise characteristics corresponding to each node in the industry chain network according to the obtained business scope information, and the method specifically comprises the following steps D1, D2 and D3.

D1. And performing word segmentation operation on the operation range information of each node in the industrial chain network to obtain a plurality of words.

In this embodiment, the TF-IDF algorithm is used to calculate the word frequency inverse document frequency (i.e., TF-IDF value) of the operation range information of each node in the industry chain network, and word segmentation operation is performed on the operation range information of each node to obtain a plurality of words. It should be noted that, in this embodiment, jieba word segmentation is adopted for performing word segmentation on the operation range information of the node, where jieba is an excellent Chinese word segmentation third party library, and the principle of the method is that a Chinese word library is utilized to determine the association probability between Chinese characters, and the Chinese characters form a word group with large association probability, so as to form a word segmentation result. There are three modes of jieba word segmentation operations: the first is an accurate mode, in which text is accurately segmented, and redundant words are not present; the second is full mode, in which all possible words in the text are scanned out, there is redundancy (i.e. there can be one text that can be split from different angles into different words), in which different words can be mined out; the third is a search engine mode, which re-segments long terms on the basis of an exact mode. The jieba word segmentation operation of this embodiment adopts an accurate mode.

For example, the operation range information corresponding to a certain node O is: fertilizers, pesticides, seeds, plastics, hardware, machinery and accessories, steel, wood, mineral products sales and agency. After the business scope information of the node O is segmented, 12 words can be obtained, and the words are respectively as follows: "fertilizers", "pesticides", "seeds", "plastics", "hardware", "mechanical equipment", "fittings", "steel", "wood", "mineral products", "sales", "agents".

D2. And calculating word frequency inverse document frequency of each word, and extracting a plurality of words with word frequency inverse document frequency arranged in front of a preset ranking as operation range keywords of the nodes.

After word segmentation operation is carried out on the business scope information of the nodes to obtain a plurality of words, a system calculates word frequency inverse document frequency (namely TF-IDF value) of each word by using a TF-IDF algorithm, specifically, calculates word frequency TF of each word in a current text (namely business scope information) firstly, calculates inverse document frequency IDF, and then combines the word frequency TF and the inverse document frequency IDF to obtain word frequency inverse document frequency (namely TF-IDF value) of the word. Taking the node O as an example, the calculation process is specifically as follows:

(1) The word frequency TF of each word in the operation range information of the node is calculated.

The method comprises the steps of segmenting operation range information of a node O to obtain 12 words of chemical fertilizer, pesticide, seed, plastic, hardware, mechanical equipment, accessory, steel, wood, mineral product, sales and agency, wherein the word frequency TF of the word of chemical fertilizer is 1/12 when the word of chemical fertilizer appears once in the operation range information of the node O, the word frequency TF of the word of pesticide is 1/12 and … … when the word of agency appears once in the operation range information of the node O, and the word frequency TF of the word of agency is 1/12.

(2) Calculating the inverse document frequency IDF of each word in the operation range information of the node, wherein the calculation formula is specifically as follows:

IDF＝log(N/(DF+1))；

wherein, N represents the total number of documents in the corpus, namely the total number of business scope information of all nodes in the industry chain network, and the specific value corresponds to the node number of the industry chain network; DF represents the total number of documents containing the words of the required word frequency inverse document frequency, namely the number of words containing the required word frequency inverse document frequency in the business scope information of all nodes in the industry chain network.

(3) The word frequency TF and the inverse document frequency IDF are combined to obtain word frequency inverse document frequency (namely TF-IDF value) of the words, and the calculation formula is specifically as follows: TF-idf=tf=tf log (N/(df+1)).

Then, the system extracts a plurality of words with word frequency inverse document frequency (namely TF-IDF value) ranked in front of a preset ranking as operation range keywords of the nodes, for example, extracts the four top 4 words with word frequency inverse document frequency (namely TF-IDF value) ranked as operation range keywords of the nodes. In this embodiment, the TF-IDF algorithm is used to calculate the word frequency inverse document frequency (i.e., TF-IDF value) of each word for the operation range information of the node O, so as to obtain TF values of each word, IDF values, and output results of TF-IDF values as shown in fig. 2, so that it can be seen that the first four words of the TF-IDF values are "hardware," "pesticide," "sales," and "agent" respectively, and then the 4 words are extracted as operation range keywords of the node.

D3. And classifying each operation range keyword of the node by using the trained classification model to obtain enterprise characteristics of each node.

In this embodiment, after obtaining the operation scope keywords of the nodes, the trained classification model is used to classify each operation scope keyword of the nodes to obtain the enterprise characteristics of each node. The specific operation of training the classification model is as follows:

(1) A training sample set and a test sample set are constructed.

As described above, after word segmentation operation is performed on the business scope information of each node in the industry chain network, a plurality of words are obtained, then the word frequency inverse document frequency (i.e. TF-IDF value) of each word is calculated, then a plurality of feature vectors can be constructed according to the word frequency inverse document frequency (i.e. TF-IDF value) of each word in the business scope information of each node, then all feature vectors are labeled with labels corresponding to "products" or "services" according to word attributes manually, and then all the labeled feature vectors are divided into training sample sets and test sample sets according to a preset proportion.

For example, after the business scope information of the node O is segmented to obtain 12 words, "fertilizer", "pesticide", "seed", "plastic", "hardware", "mechanical equipment", "accessory", "steel", "wood", "mineral product", "sales", "agent", then the word frequency inverse document frequency (i.e., TF-IDF value) of each word is calculated, as shown in fig. 2, then a feature vector is constructed according to the TF-IDF value of each word, in which the feature value of the corresponding word is the TF-IDF value of the word itself and the feature values of the corresponding other words are 0, so that a first feature vector (0.133,0,0,0,0,0,0,0,0,0,0,0) can be constructed according to the word "fertilizer" and the TF-IDF values thereof, a second feature vector (0,0.267,0,0,0,0,0,0,0,0,0,0,0) can be constructed according to the word "pesticide" and the TF-IDF values thereof, … …, and a thirteenth feature vector (0,0,0,0,0,0,0,0,0,0,0,0,0.267) can be constructed according to the word "agent" and the TF-IDF values thereof. In this way, assuming that 100 nodes exist in the industrial chain network, 12 words are included after word segmentation operation is performed on operation range information of each node, then 12 feature vectors can be constructed according to operation range information of each node, a total of 1200 feature vectors are included in 100 nodes, then all feature vectors are labeled with corresponding products or services according to word attributes manually, for example, a first feature vector can be labeled with products according to attribute of a word chemical fertilizer, a first feature vector can be labeled with products according to attribute of a word pesticide, … …, a thirteenth feature vector can be labeled with services according to attribute of a word agent, and all the labeled feature vectors are divided into a training sample set and a test sample set according to preset proportion (for example, 7:3).

(2) And inputting the training sample set into the classification model for training, testing the trained classification model by using the test sample set, and correcting the parameters of the classification model according to the test result.

In this embodiment, a naive bayes classifier model is adopted as the classification model, after a training sample set is input into the classification model for training, the training sample set is used to test the trained classification model, and then the accuracy Precision, recall and comprehensive performance evaluation index F in the test process are calculated _e And performing performance evaluation on the trained classification model.

In this embodiment, the feature vector of the label "product" is used as the positive sample, and the feature vector of the label "service" is used as the negative sample.Accuracy Precision indicates that the classification model correctly divides the positive class sample into positive and total classes

Representing a real example, specifically, correctly dividing a positive class sample into positive numbers; FP represents a false positive example, specifically refers to the number of erroneously dividing the inverse sample into positive numbers; FN represents a false negative example, specifically, the number of false positive samples divided into negative ones.

In this embodiment, the manner of counting the values of TP, FP, FN is specifically as follows: when the trained classification model is tested by using the test sample set, if the classification result obtained after inputting a certain feature vector into the classification model is positive and the label of the feature vector is a product (corresponding to a positive class sample), the feature vector is marked as correctly dividing the positive class sample into positive and corresponding to TP; if the classification result obtained after inputting a certain feature vector into the classification model is positive and the label of the feature vector is "service" (corresponding to the inverse sample), marking the feature vector as erroneously classifying the inverse sample as positive, corresponding to FP; if the classification result obtained after inputting a certain feature vector into the classification model is inverse and the label of the feature vector is "product" (corresponding to the positive class sample), the feature vector is marked as erroneously classifying the positive class sample as inverse, corresponding to FN. And counting the number of each mark to obtain the values of TP, FP and FN.

The higher the values of accuracy Precision and Recall, the better the classification of the classification model, however, accuracy Precision tends to be lower when Recall is higher, and vice versa. Therefore, the weighted harmonic mean value of the Recall rate Recall and the accuracy rate Precision is used as the comprehensive performance evaluation index F of the classification model _e And performing performance evaluation on the trained classification model. Comprehensive performance evaluation index F _e The larger the value of (2), the better the performance of the classification model is, and the calculation formula is specifically as follows:

after the training sample set is input into the classification model for training each time, the training sample set is used for testing the trained classification model to obtain the comprehensive performance evaluation index F _e If the value of (1) is the comprehensive performance evaluation index F _e If the value of the model number does not reach the preset threshold value, correcting the parameters of the classification model, continuously training the classification model by using a training sample set, and then testing the trained classification model by using a test sample set to obtain a new comprehensive performance evaluation index F _e Repeating the steps until the comprehensive performance evaluation index F _e The value of (2) reaches a preset threshold, which means that the parameters of the classification model are proper, the performance of the classification model is better, the training of the classification model is completed, and the trained classification model can better classify the keywords in each operation range as 'products' or 'services'.

In this embodiment, the trained classification model is used to classify the operation range keywords "hardware" and "pesticide" of the node O as "products", and classify the operation range keywords "sales" and "service" as "service", so that the enterprise characteristics of the node O can be obtained as follows: the main products are hardware and pesticide, and the main services are sales and agency.

E. The core node and enterprise characteristics of each node are visually updated in the industry chain network.

And C, identifying the core nodes in the industrial chain network in the step C, and after obtaining the enterprise characteristics of each node in the step D, visually updating the core nodes and the enterprise characteristics of each node in the industrial chain network, for example, magnifying the node size of the core nodes in the industrial chain network, directly adding the enterprise characteristics of each node to each node in the industrial chain network, so that a user can quickly distinguish which nodes are the core nodes according to the node size when looking up the industrial chain network, and can directly see the enterprise characteristics on the nodes. Therefore, based on the industrial chain network, not only the industry attribution of the enterprise corresponding to each node and the position of the enterprise in the industrial chain can be determined, but also the core enterprise among all enterprises of the industrial chain can be determined based on the core node, and the main business information of all enterprises of the industrial chain can be determined based on the enterprise characteristics of each node, so that the industrial chain network is better known in depth from the outside, and effective support can be provided for external decision-making.

The above-described embodiments are provided for the present invention only and are not intended to limit the scope of patent protection. Insubstantial changes and substitutions can be made by one skilled in the art in light of the teachings of the invention, as yet fall within the scope of the claims.

Claims

1. The industrial chain network construction method based on bidding information is characterized by comprising the following steps:

2. The method according to claim 1, wherein in the step a, the bidding information includes a bidding project name related to the business, a bid unit and a bid unit in the bidding project; in the step B, when the side between the nodes is generated based on the bidding information in each item of enterprise information, the direction of the side between the nodes is determined based on the relationship between the bidding item, the bidding unit, and the winning unit in the bidding information.

3. The method for building an industrial chain network based on bidding information according to claim 2, wherein in the step a, the number of transactions between enterprises is obtained according to the number of different bidding items including the same bidding unit and winning unit in the bidding information; in the step B, the bidding project weight among the nodes is calculated according to the transaction times among enterprises, and the thickness of the edges among the nodes is defined according to the calculated bidding project weight.

4. The method for building an industrial chain network based on bidding information according to claim 1, wherein in the step C1:

the calculation formula of the cluster coefficient index isWherein CC (i) represents a cluster coefficient index of the node i, k (i) represents the number of connections between the node i and its neighboring nodes, and t (i) represents the number of edges between all neighboring nodes of the node i.

5. The method for building an industrial chain network based on bidding information according to claim 4, wherein in the step C2, the calculation formula of the core composite score is: SC (i) =w ₁ *DC(i)+w ₂ *BC(i)+w ₃ *AC(i)+w ₄ * CC (i); wherein SC (i) represents the core composite score of the node i, DC (i) represents the degree centrality index of the node i, BC (i) represents the medium centrality index of the node i, AC (i) represents the near centrality index of the node i, CC (i) represents the clustering coefficient index of the node i, and w ₁ Weight, w, representing a centrality index of node i ₂ Weights representing the median centrality index of node i, w ₃ Weight representing approximate centrality index of node i, w ₄ The weight of the cluster coefficient index representing the node i.

6. The method for building an industrial chain network based on bidding information according to claim 1, wherein in the step D2, a calculation formula of word frequency inverse document frequency of each word is: TF-idf=tf=tf log (N/(df+1)); the TF-IDF represents word frequency inverse document frequency of words, TF represents word frequency of words, IDF represents inverse document frequency of words, N represents total operation range information of all nodes in the industry chain network, and DF represents the number of words containing required word frequency inverse document frequency in operation range information of all nodes in the industry chain network.

7. The method for building an industrial chain network based on bidding information according to claim 1, wherein in step D3, training the classification model comprises the following operations: and constructing a training sample set and a test sample set, inputting the training sample set into the classification model for training, testing the trained classification model by using the test sample set, and correcting parameters of the classification model according to test results.

8. The method for building an industrial chain network based on bidding information according to claim 1, wherein in the step E, the enterprise characteristics of the core node and each node are visually updated in the industrial chain network, specifically: the node size of the core nodes in the industry chain network is enlarged, and the enterprise characteristics of each node are directly added to each node of the industry chain network.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the industrial chain network construction method according to any one of claims 1 to 8.

10. An industry chain network construction system based on bidding information, comprising a processor and the computer readable storage medium of claim 9.