CN110378569A - Industrial relations chain building method, apparatus, equipment and storage medium - Google Patents

Industrial relations chain building method, apparatus, equipment and storage medium Download PDF

Info

Publication number
CN110378569A
CN110378569A CN201910548138.8A CN201910548138A CN110378569A CN 110378569 A CN110378569 A CN 110378569A CN 201910548138 A CN201910548138 A CN 201910548138A CN 110378569 A CN110378569 A CN 110378569A
Authority
CN
China
Prior art keywords
industry
data
industrial
history
list item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910548138.8A
Other languages
Chinese (zh)
Inventor
崔德冠
谭涵秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN201910548138.8A priority Critical patent/CN110378569A/en
Publication of CN110378569A publication Critical patent/CN110378569A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to industry chain building, discloses a kind of industrial relations chain building method, apparatus, equipment and storage medium and history industry data are pre-processed this method comprises: reading history industry data to obtain effective industry data;Clustering is carried out to effective industry data using clustering algorithm, to obtain the industry data to be associated of different cluster classifications;Affairs type data are converted by industry data to be associated based on cluster classification, and inter-industry analysis is carried out to affairs type data using association analysis algorithm, obtain Industrial Correlation result;Industrial relations chain is constructed according to Industrial Correlation result, by being then based on history industry data, industry cluster and association mining are carried out to the industry data in history industry data by clustering, association analysis algorithm, all industries, industry broad covered area, while accuracy with higher can be covered to ensure that the industrial chain finally constructed almost.

Description

Industrial relations chain building method, apparatus, equipment and storage medium
Technical field
The present invention relates to industrial chain technical field more particularly to a kind of industrial relations chain building method, apparatus, equipment and deposit Storage media.
Background technique
Existing industrial chain modeling comes from according to each human subject in production, work, the medium big data content obtained of life Corresponding big data model is so constructed, mainly industrial chain is excavated from the angle of enterprise, mostly according to the letter of enterprise's microcosmic point Breath proposes industrial chain modeling and analysis method, lacks the concern to industry global index, or be directed to some specific industry Analysis on Industry Chain is carried out, global analysis is not carried out to industry.In government regulation work, lacks whole effective industrial chain and know Other method.Therefore, how industrial chain is comprehensively and effectively constructed, to realize the accurate positionin to Different Industries in industrial chain, at For a urgent problem to be solved.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill Art.
Summary of the invention
The main purpose of the present invention is to provide a kind of industrial relations chain building method, apparatus, equipment and storage medium, The technical issues of aiming to solve the problem that the prior art can not comprehensively and effectively construct industrial chain, guaranteeing Industry positioning accuracy.
To achieve the above object, the present invention provides a kind of industrial relations chain building method, the method includes following steps It is rapid:
History industry data are read, the history industry data are pre-processed to obtain effective industry data;
Clustering is carried out to effective industry data using clustering algorithm, to obtain the to be associated of different cluster classifications Industry data;
Affairs type data are converted by the industry data to be associated based on the cluster classification, and are calculated using association analysis Method carries out inter-industry analysis to the affairs type data, obtains Industrial Correlation result;
Industrial relations chain is constructed according to the Industrial Correlation result.
Preferably, the reading history industry data pre-process to obtain and effectively produce the history industry data The step of industry data, comprising:
History industry data are read, the corresponding data type of all industry data in the history industry data is obtained;
The data type is detected, the target data type for being not belonging to preset data type is being detected the presence of When, data type conversion is carried out to the corresponding Object Industry data of the target data type;
Using the history industry data after data type conversion as effective industry data.
Preferably, the history industry data using after data type conversion are also wrapped as the step of effective industry data It includes:
Preset electronic table is written into history industry data after data type conversion, to obtain initial industry tables of data;
There are when missing values list item in detecting the initial industry tables of data, numerical value is carried out to the missing values list item Filling, to obtain entire industries tables of data;
The numeric data stored in the entire industries tables of data is converted into the numeric data of default dimension to be had Effect industry tables of data, and using the data in the valid data table as effective industry data.
Preferably, it is described in detecting the initial industry tables of data there are when missing values list item, to the missing values List item carries out numerical value filling, the step of to obtain entire industries tables of data, comprising:
There are when missing values list item in detecting the initial industry tables of data, the missing values list item column is obtained Adjacent list item before be worth after value and adjacent list item;
The average value being worth after value before the adjacent list item and the adjacent list item is calculated, and according to the average value to institute It states missing values list item and carries out numerical value filling, to obtain entire industries tables of data.
Preferably, described that clustering is carried out to effective industry data using clustering algorithm, to obtain different clusters The step of industry data to be associated of classification, comprising:
All industry data in effective industry data are gathered according to preset time granularity using clustering algorithm Class obtains cluster data;
The similarity in effective industry data between each category of industry is determined according to the cluster data;
The category of industry that the similarity is more than preset threshold is determined as same cluster classification, traverses the cluster data To obtain the industry data to be associated of different cluster classifications.
Preferably, described that affairs type data, and benefit are converted for the industry data to be associated based on the cluster classification The step of are carried out by inter-industry analysis, obtains Industrial Correlation result for the affairs type data with association analysis algorithm, comprising:
The corresponding Transaction Identifier of each category of industry is generated based on the preset time granularity and the cluster classification;
The category of industry identified with same transaction is subjected to binding acquisition industry pair, and by the industry to preservation to thing Business tables of data;
Inter-industry analysis is carried out to the affairs type data stored in the Transaction Information table using association analysis algorithm, is obtained Take Industrial Correlation result.
Preferably, described the step of industrial relations chain is constructed according to the Industrial Correlation result, comprising:
Corresponding promotion degree between each industry pair for including in the Industrial Correlation result is read, the industry centering is at least Including two category of industry;
Detect whether the promotion degree is more than preset value, determines the industry to for association industry pair if being more than;
The benchmark industry of the association industry centering is determined to corresponding support and confidence level according to the association industry Classification and preposition category of industry;
Industrial relations chain is determined to corresponding benchmark category of industry and preposition category of industry according to each association industry.
In addition, to achieve the above object, the present invention also proposes a kind of industrial relations chain building device, described device includes:
Data processing module pre-processes to obtain the history industry data for reading history industry data Effective industry data;
Data clusters module, for carrying out clustering to effective industry data using clustering algorithm, to obtain not With the industry data to be associated of cluster classification;
Association analysis module, for converting affairs type number for the industry data to be associated based on the cluster classification According to, and inter-industry analysis is carried out to the affairs type data using association analysis algorithm, obtain Industrial Correlation result;
Industrial Correlation module, for constructing industrial relations chain according to the Industrial Correlation result.
In addition, to achieve the above object, the present invention also proposes that a kind of industrial relations chain building equipment, the equipment include: Memory, processor and the industrial relations chain building program that is stored on the memory and can run on the processor, The industrial relations chain building program is arranged for carrying out the step of industrial relations chain building method as described above.
In addition, to achieve the above object, the present invention also proposes a kind of storage medium, industry is stored on the storage medium Relation chain construction procedures, the industrial relations chain building program realize industrial relations chain as described above when being executed by processor The step of construction method.
The present invention pre-processes history industry data by reading history industry data to obtain effective industry number According to;Clustering is carried out to effective industry data using clustering algorithm, to obtain the industry data to be associated of different cluster classifications; Convert affairs type data for industry data to be associated based on cluster classification, and using association analysis algorithm to affairs type data into Row inter-industry analysis obtains Industrial Correlation result;Industrial relations chain is constructed according to Industrial Correlation result, by being then based on history Industry data, by clustering, association analysis algorithm come in history industry data industry data carry out industry cluster with And association mining, to ensure that the industrial chain broad covered area finally constructed, while accuracy with higher.
Detailed description of the invention
Fig. 1 is the structural representation of the industrial relations chain building equipment for the hardware running environment that the embodiment of the present invention is related to Figure;
Fig. 2 is the flow diagram of industrial relations chain building method first embodiment of the present invention;
Fig. 3 is the flow diagram of industrial relations chain building method second embodiment of the present invention;
Fig. 4 is the flow diagram of industrial relations chain building method 3rd embodiment of the present invention;
Fig. 5 is the structural block diagram of industrial relations chain building device first embodiment of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is the industrial relations chain building equipment knot for the hardware running environment that the embodiment of the present invention is related to Structure schematic diagram.
As shown in Figure 1, the industrial relations chain building equipment may include: processor 1001, such as central processing unit (Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components.User interface 1003 may include display Shield (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include that the wired of standard connects Mouth, wireless interface.Network interface 1004 optionally may include standard wireline interface and wireless interface (such as Wireless Fidelity (WIreless-FIdelity, WI-FI) interface).Memory 1005 can be the random access memory (Random of high speed Access Memory, RAM) memory, be also possible to stable nonvolatile memory (Non-Volatile Memory, ), such as magnetic disk storage NVM.Memory 1005 optionally can also be the storage device independently of aforementioned processor 1001.
It will be understood by those skilled in the art that structure shown in Fig. 1 is not constituted to industrial relations chain building equipment It limits, may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in Figure 1, as may include operating system, data storage mould in a kind of memory 1005 of storage medium Block, network communication module, Subscriber Interface Module SIM and industrial relations chain building program.
In industrial relations chain building equipment shown in Fig. 1, network interface 1004 is mainly used for carrying out with network server Data communication;User interface 1003 is mainly used for carrying out data interaction with user;In industrial relations chain building equipment of the present invention Processor 1001, memory 1005 can be set in industrial relations chain building equipment, and the industrial relations chain building equipment is logical It crosses processor 1001 and calls the industrial relations chain building program stored in memory 1005, and execute provided in an embodiment of the present invention Industrial relations chain building method.
The embodiment of the invention provides a kind of industrial relations chain building methods, are industrial relations of the present invention referring to Fig. 2, Fig. 2 The flow diagram of chain building method first embodiment.
In the present embodiment, the industrial relations chain building method the following steps are included:
Step S10: history industry data are read, the history industry data are pre-processed to obtain effective industry number According to;
It is set it should be noted that the executing subject of the method for the present invention can be calculating service having data processing function It is standby, such as smart phone, tablet computer, PC terminal device (hereinafter referred to as modeling terminal).The history industry data Industry data information including different dimensions different year, as industry title, the time, season (or monthly), regional GDP, Speedup is year-on-year, new registration enterprise number, the above enterprise's number of scale, scale value added above, the range of loss, total profit, finished product are deposited Goods etc..Under normal conditions, which saves in a tabular form.As shown in table 1 below, table 1 is history industry tables of data.
1 history industry tables of data of table
In the concrete realization, modeling terminal can obtain the history industry tables of data prestored from database, then from described History industry data are read in history industry tables of data, and then the history industry data are pre-processed to obtain and effectively produce Industry data.
Wherein, the pretreatment carries out data type conversion, missing values to the industry data in history industry tables of data The processing such as filling and/or dimension standardization.The data type conversion, i.e., be converted to numeric data for string data; " blank " list item that do not fill in above table is filled by the Missing Data Filling;The dimension standardization, i.e., by institute There is the numeric data of different dimensions unified to identical dimension.
Step S20: clustering is carried out to effective industry data using clustering algorithm, to obtain different cluster classifications Industry data to be associated;
It should be noted that the clustering algorithm, that is, k- mean algorithm (k-means algorithm) is that input is poly- in the present embodiment Class number k, and the database comprising n data object, output meet a kind of algorithm of variance minimum sandards k cluster.k- Means algorithm receives input quantity k;Then n data object is divided into k cluster to meet cluster obtained: Object similarity in same cluster is higher;And the object similarity in different clusters is smaller.
In addition, the cluster classification is the classification logotype being calculated by clustering algorithm, with identical cluster classification There are higher similarities, such as agricultural and textile industry, agricultural and textile industry etc. for the type of industry.
By above-mentioned table 1 it is found that due to the industry data in history industry tables of data be all with the time (as year, season or It is monthly) it is what unit was counted.Therefore in the present embodiment, modeling terminal can be using K-means algorithm respectively to history industry Data carry out clustering according to preset time granularity (such as year or season), to obtain the production to be associated of different cluster classifications Industry data.For example, temporally granularity " the 2017 annual first quarter " (can be industrial sectors of national economy classification to all type of industry In 97 major class or public document in common industrial classification bore, such as 7 great strategy new industries, 5 big Mirae Corp. Deng) carry out clustering.
In the concrete realization, modeling terminal can be using clustering algorithm to all industry data in effective industry data It is clustered according to preset time granularity, obtains cluster data;Then effective industry number is determined according to the cluster data Similarity between each category of industry;The category of industry that the similarity is more than preset threshold is determined as same cluster again Classification traverses the cluster data finally to obtain the industry data to be associated of different cluster classifications.
Further, in this embodiment terminal is modeled after the industry data to be associated for getting different cluster classifications, It can be stored in a tabular form.As shown in table 2 below, table 2 is the corresponding industry to be associated of the industry data to be associated Tables of data.
The industry tables of data to be associated of table 2
Category of industry Year Season Cluster classification
Agricultural 2017 1 1
Fishery 2017 1 1
Real estate 2017 1 3
…… …… …… ……
Step S30: affairs type data are converted for the industry data to be associated based on the cluster classification, and utilize pass Join parser and inter-industry analysis is carried out to the affairs type data, obtains Industrial Correlation result;
It should be understood that the association analysis algorithm (association analysis) is also known as association mining, refer in transaction data, relationship In data or other information carrier, search be present in frequent mode between project set or object set, association, correlation or Causal structure.
In this step, modeling terminal is first based on the cluster classification and converts affairs type number for the industry data to be associated According to then to affairs type data progress inter-industry analysis, acquisition Industrial Correlation result.
It is corresponded to specifically, modeling terminal can generate each category of industry based on the preset time granularity and the cluster classification Transaction Identifier;The category of industry identified with same transaction is subjected to binding acquisition industry pair, and by the industry to preservation To Transaction Information table;Industrial Correlation point is carried out to the affairs type data stored in the Transaction Information table using association analysis algorithm Analysis obtains Industrial Correlation result.
Further, in this embodiment the Transaction Identifier can be according to default generation strategy " time granularity (year/ Season)+cluster classification " the mark character that generates, such as set Transaction Identifier as the number of 8 digits, according to digit slave height to Low is successively preceding 4 bit digital for year, and 5-6 bit digital is season (or monthly), and 7-8 bit digital is cluster classification, for 0 filling of front if being unsatisfactory for 2 of season (or monthly) and/or cluster classification, such as: the first quarter in 2017 clusters 1 affairs Mark are as follows: 20170101.
In the concrete realization, modeling terminal can will have phase after getting the corresponding Transaction Identifier of each category of industry Category of industry with Transaction Identifier carries out binding acquisition industry pair, and by the industry to preservation to Transaction Information table.Such as 3 institute of table Show, table 3 is Transaction Information table.
3 Transaction Information table of table
Affairs ID Affairs ID
20170101 Agricultural, fishery ...
20170103 Real estate ...
…… ……
It should be noted that the i.e. described Transaction Identifier of affairs ID (Identification) described in above-mentioned table 3.The production Industry pair, i.e., (agricultural, fishery) in the corresponding associated category of industry of same Transaction Identifier, such as upper table 3 is industry pair.
Further, modeling terminal is after getting the Transaction Information table, i.e., using association analysis algorithm to described The affairs type data stored in Transaction Information table carry out inter-industry analysis, obtain Industrial Correlation result.
Step S40: industrial relations chain is constructed according to the Industrial Correlation result.
It should be understood that the method for association analysis can be found that the connection between affairs, such as correlation rule or frequent item set. A correlation rule is measured using three indexs under normal conditions, these three indexs are respectively: support, confidence level and promotion Degree.Wherein, Support (support): indicating while including that the affairs of A and B account for the ratio of all affairs.If indicated with P (A) Using the ratio of A affairs, then Support=P (A&B);Confidence (confidence level): it indicates using in the affairs comprising A It simultaneously include the ratio of B affairs, i.e., the affairs comprising A and B account for the ratio comprising A affairs simultaneously.Formula expression: Confidence =P (A&B)/P (A);Lift (promotion degree): indicating " simultaneously including the ratio of B affairs in the affairs comprising A " and " includes B affairs Ratio " ratio.Formula expression: Lift=(P (A&B)/P (A))/P (B)=P (A&B)/P (A)/P (B).The reflection of promotion degree The correlation of A and B in correlation rule, promotion degree>1 and it is higher show that positive correlation is higher, promotion degree<1 and lower show Negative correlation is higher, and promotion degree=1 shows no correlation.
In the concrete realization, modeling terminal carries out inter-industry analysis to the affairs type data stored in Transaction Information table Afterwards, the Industrial Correlation result got can be as shown in table 4 below, and table 4 is the association list.
4 association list of table
It should be noted that every a line correlation rule is indicated to meet support be ɑ, confidence level β, is mentioned in above-mentioned table 4 Under the premise of liter degree is θ, the category of industry B with incidence relation can be derived by category of industry A.Wherein, ɑ and β value is 0 To between 1, using category of industry B as benchmark category of industry in the case where, category of industry A is then preposition category of industry.
In the concrete realization, the present embodiment models terminal after getting the Industrial Correlation result, that is, can be read described Corresponding promotion degree between each industry pair for including in Industrial Correlation result, the industry centering include at least two industry classes Not;Detect whether the promotion degree is more than preset value (being traditionally arranged to be 1), determines the industry to for association industry if being more than It is right;Then the benchmark industry class of the association industry centering is determined to corresponding support and confidence level according to the association industry Other and preposition category of industry;Industry is finally determined to corresponding benchmark category of industry and preposition category of industry according to each association industry Relation chain.
Furthermore it should be noted that some preposition category of industry may also be the corresponding base of another preposition category of industry Quasi- category of industry, such as " agricultural " are the preposition category of industry of " textile industry ", and the benchmark of " agricultural " " being wood-processing industry " Category of industry, therefore, the corresponding industrial relations chain of this three are just " wood-processing industry=> agricultural=> textile industry ";But practical feelings In condition, between associated industry pair can preposition category of industry each other, such as in industry between " agricultural, textile industry ", currently Set category of industry be " agricultural " when, then " textile industry " be benchmark category of industry;When preposition category of industry is " textile industry ", then " agricultural " is benchmark category of industry.Therefore, the corresponding industry of " agricultural, the textile industry, wood-processing industry " three finally obtained is closed Tethers is " wood-processing industry=> agricultural≤> textile industry ".
The present embodiment pre-processes history industry data by reading history industry data to obtain effective industry number According to;Clustering is carried out to effective industry data using clustering algorithm, to obtain the industry data to be associated of different cluster classifications; Convert affairs type data for industry data to be associated based on cluster classification, and using association analysis algorithm to affairs type data into Row inter-industry analysis obtains Industrial Correlation result;Industrial relations chain is constructed according to Industrial Correlation result, by being then based on history Industry data, by clustering, association analysis algorithm come in history industry data industry data carry out industry cluster with And association mining, to ensure that the industrial chain broad covered area finally constructed, while accuracy with higher.
With reference to Fig. 3, Fig. 3 is the flow diagram of industrial relations chain building method second embodiment of the present invention.
Based on above-mentioned first embodiment, in the present embodiment, the step S10 includes:
Step S101: reading history industry data, obtains the corresponding number of all industry data in the history industry data According to type;
It should be understood that the data type that terminal saves data is not identical due under different scenes, modeling terminal from In the history industry data read in exterior terminal also and not all data (such as Time of Day etc.) relevant to numerical value all It is being stored with the format of numeric data (such as 2017,2018,1,2,3), therefore models terminal and need to these non-numbers The data of Value Data type format, to guarantee the unification of data format.
It will be appreciated that data type, that is, data format described in the present embodiment (data format), is that description data are protected There are the rules in file or record, can be the text formatting of character style or the compressed format of binary data form.
In the concrete realization, modeling terminal can call parseInt () function by the JavaScript script write in advance The corresponding data format of all industry data is obtained, the data then formatted to needs format.
Step S102: detecting the data type, is detecting the presence of the target for being not belonging to preset data type When data type, data type conversion is carried out to the corresponding Object Industry data of the target data type;
It should be noted that the preset data type can be preset data format, described in the present embodiment Preset data type is preferably numeric data.
Specifically, modeling terminal the data type got can be detected, detect the presence of be not belonging to it is described pre- If target data type (such as the character string class of binary system, octal system, hexadecimal or other any systems of data type Type) when, call parseInt () function that the corresponding Object Industry data of target data type are converted to numeric data type Data.
Step S103: using the history industry data after data type conversion as effective industry data.
In the concrete realization, string data all in history industry data is being converted to numeric data by modeling terminal It afterwards, can be using the history industry data after data type conversion as effective industry data, for subsequent carry out clustering.
The present embodiment is initial to obtain by the way that preset electronic table is written in the history industry data after data type conversion Industry tables of data;There are when missing values list item in detecting initial industry tables of data, numerical value filling is carried out to missing values list item, To obtain entire industries tables of data;The numeric data stored in entire industries tables of data is converted to the numeric data of default dimension To obtain effective industry tables of data, and using the data in valid data table as effective industry data, so that clustering Targeted industry data dimension is unified, accuracy is high, ensure that the efficiency of clustering, improves the accurate of analysis result Degree.
With reference to Fig. 4, Fig. 4 is the flow diagram of industrial relations chain building method 3rd embodiment of the present invention.
Based on the various embodiments described above, in the present embodiment, the step S103 may particularly include following steps:
Step S1030: being written preset electronic table for the history industry data after data type conversion, to obtain initial produce Industry tables of data;
It should be understood that the preset electronic table (Spreadsheet), also known as spreadsheet, be an analoglike paper The computer program of upper computation sheet can show a series of grid being made of row and columns.Can be stored in cell numerical value, Calculating formula or text.Preset electronic table described in the present embodiment is preferably Excel table.
In the concrete realization, modeling terminal can count these after carrying out data type conversion to history industry data It is written in Excel table according to the list item (such as industry title, time, season, regional GDP etc.) according to setting, to obtain Take initial industry tables of data.
Step S1031: there are when missing values list item in detecting the initial industry tables of data, to the missing values table Item carries out numerical value filling, to obtain entire industries tables of data;
It should be noted that in the missing values list item, that is, initiating electron table vacancy or non-fill substance list item, For the smooth building for guaranteeing Follow-up Industry chain, there is missing in modeling terminal in detecting initial industry tables of data in the present embodiment When being worth list item, numerical value filling will be carried out to the missing values list item.Specifically, modeling terminal is detecting the initial industry number After in table there are value and adjacent list item when missing values list item, is obtained before the adjacent list item of the missing values list item column Value;The average value being worth after value before the adjacent list item and the adjacent list item is calculated, and is lacked according to the average value to described Mistake value list item carries out numerical value filling, to obtain entire industries tables of data.
It should be understood that guarantee the accuracy when filling of missing values list item, it is contemplated that under normal conditions, in all kinds of tables The corresponding dimension of the data of each column or type are identical, and terminal is modeled in the present embodiment and is detecting the initial industry tables of data In there are when missing values list item, obtain to be worth after value and adjacent list item before the adjacent list item of the missing values list item column;Meter The average value being worth after value before the adjacent list item and the adjacent list item is calculated, and according to the average value to the missing values table Numerical value filling is carried out, such as corresponding regional GDP in 2017 second quarter in time of the type of industry " agricultural " is missing values list item, Modeling terminal knows that " agricultural " 2017 first quarter in time and corresponding regional GDP in the third quarter are respectively 20,000,000,000 Hes by inquiry 30000000000, it at this time then can be by calculating the average regional GDP " (200+300)/2=,250 hundred million " in the first quarter and the third quarter, then By 25,000,000,000 as corresponding regional GDP in " agricultural " 2017 second quarter in time.
Certainly, in the present embodiment, the filling of the missing values can also be and obtain the missing values list item column pair The permutation average value answered, then using the permutation average value as the missing values carry out numerical value filling, be also possible to directly with 0 value come Instead of, specific filling mode, the present embodiment is without restriction.
Step S1032: the numeric data stored in the entire industries tables of data is converted to the numerical value number of default dimension Effective industry tables of data is obtained accordingly, and using the data in the valid data table as effective industry data.
It should be understood that the dimension or unit of the data of different dimensions are not to the utmost due to the data dimension multiplicity of industry data It is identical, such as regional GDP is 0-1000 hundred million, the above enterprise's number of scale is 0-100 etc., and the corresponding dimension of the two is not Identical, for the computational efficiency for improving modeling terminal, modeling terminal will also be to storing in entire industries tables of data in the present embodiment Numeric data carries out dimension conversion, they is uniformly arrived identical default dimension (such as mean value is 0, the range that variance is 1), and or It is by each columns Value Data by sorting from small to large, then waits frequency divisions case to several sections, the unified number in each section Word indicates that the frequency such as such as one group of number of the enterprise data (21,23,24,26,32,38,40,45) is mapped to 4 sections, can use 1, 2,3,4, it indicates, it may be assumed that 21=1;23=1;24=2;26=2;32=3;38=3;40=4;45=4 obtain (21,23,24, 26,32,38,40,45)=(1,1,2,2,3,3,4,4).
In the concrete realization, the numeric data stored in entire industries tables of data can be converted to default dimension by modeling terminal Numeric data to obtain effective industry tables of data, then using the data in valid data table as effective industry data carry out after Continuous clustering.
The present embodiment is by, there are when missing values list item, obtaining missing values list item institute in detecting initial industry tables of data It is worth after value and adjacent list item before the adjacent list item of column;The average value being worth after value before adjacent list item and adjacent list item is calculated, And numerical value filling is carried out to the missing values list item according to average value, to obtain entire industries tables of data.The present embodiment is for number According to missing values list item present in table, it is filled out by calculating the average value of the adjacent list item of missing values list item column It fills, ensure that the accuracy of data filling, reduce accidental error.
In addition, the embodiment of the present invention also proposes a kind of storage medium, industrial relations chain structure is stored on the storage medium Program is built, the industrial relations chain building program realizes industrial relations chain building method as described above when being executed by processor The step of.
It is the structural block diagram of industrial relations chain building device first embodiment of the present invention referring to Fig. 5, Fig. 5.
As shown in figure 5, the industrial relations chain building device that the embodiment of the present invention proposes includes:
Data processing module 501 pre-processes to obtain the history industry data for reading history industry data Take effective industry data;
Data clusters module 502, for carrying out clustering to effective industry data using clustering algorithm, to obtain The industry data to be associated of difference cluster classification;
Association analysis module 503, for converting affairs type for the industry data to be associated based on the cluster classification Data, and inter-industry analysis is carried out to the affairs type data using association analysis algorithm, obtain Industrial Correlation result;
Industrial Correlation module 504, for constructing industrial relations chain according to the Industrial Correlation result.
The present embodiment pre-processes history industry data by reading history industry data to obtain effective industry number According to;Clustering is carried out to effective industry data using clustering algorithm, to obtain the industry data to be associated of different cluster classifications; Convert affairs type data for industry data to be associated based on cluster classification, and using association analysis algorithm to affairs type data into Row inter-industry analysis obtains Industrial Correlation result;Industrial relations chain is constructed according to Industrial Correlation result, by being then based on history Industry data, by clustering, association analysis algorithm come in history industry data industry data carry out industry cluster with And association mining, to ensure that the industrial chain broad covered area finally constructed, while accuracy with higher.
Based on the above-mentioned industrial relations chain building device first embodiment of the present invention, industrial relations chain building dress of the present invention is proposed The second embodiment set.
In the present embodiment, the data processing module 501 is also used to read history industry data, obtains the history The corresponding data type of all industry data in industry data;The data type is detected, is not belonged to detecting the presence of When the target data type of preset data type, data class is carried out to the corresponding Object Industry data of the target data type Type conversion;Using the history industry data after data type conversion as effective industry data.
Further, the data processing module 501 is also used to the history industry data write-in after data type conversion Preset electronic table, to obtain initial industry tables of data;There are missing values list items in detecting the initial industry tables of data When, numerical value filling is carried out to the missing values list item, to obtain entire industries tables of data;It will be deposited in the entire industries tables of data The numeric data put is converted to the numeric data of default dimension to obtain effective industry tables of data, and will be in the valid data table Data as effective industry data.
Further, the data processing module 501 is also used to exist in detecting the initial industry tables of data scarce When mistake value list item, obtains and be worth before the adjacent list item of the missing values list item column after value and adjacent list item;Calculate the phase The average value being worth after value and the adjacent list item before adjacent list item, and the missing values list item is counted according to the average value Value filling, to obtain entire industries tables of data.
Further, the data clusters module 502 is also used to using clustering algorithm in effective industry data All industry data are clustered according to preset time granularity, obtain cluster data;Have according to cluster data determination Similarity in effect industry data between each category of industry;The category of industry that the similarity is more than preset threshold is determined as together One cluster classification traverses the cluster data to obtain the industry data to be associated of different cluster classifications.
Further, the association analysis module 503 is also used to based on the preset time granularity and the cluster classification Generate the corresponding Transaction Identifier of each category of industry;The category of industry identified with same transaction is subjected to binding acquisition industry pair, And by the industry to preservation to Transaction Information table;Using association analysis algorithm to the affairs type stored in the Transaction Information table Data carry out inter-industry analysis, obtain Industrial Correlation result.
Further, the Industrial Correlation module 504 is also used to read each industry in the Industrial Correlation result included The corresponding promotion degree between, the industry centering include at least two category of industry;Detect whether the promotion degree is more than pre- If value, determine the industry to for association industry pair if being more than;According to the association industry to corresponding support and confidence Degree determines the benchmark category of industry and preposition category of industry of the association industry centering;According to each association industry to corresponding benchmark Category of industry and preposition category of industry determine industrial relations chain.
It is real that the other embodiments or specific implementation of industrial relations chain building device of the present invention can refer to above-mentioned each method Example is applied, details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as read-only memory/random access memory, magnetic disk, CD), including some instructions are used so that a terminal device (can To be mobile phone, computer, server, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of industrial relations chain building method, which is characterized in that the described method includes:
History industry data are read, the history industry data are pre-processed to obtain effective industry data;
Clustering is carried out to effective industry data using clustering algorithm, to obtain the industry to be associated of different cluster classifications Data;
Affairs type data are converted by the industry data to be associated based on the cluster classification, and utilize association analysis algorithm pair The affairs type data carry out inter-industry analysis, obtain Industrial Correlation result;
Industrial relations chain is constructed according to the Industrial Correlation result.
2. the method as described in claim 1, which is characterized in that the reading history industry data, to the history industry number The step of according to being pre-processed to obtain effective industry data, comprising:
History industry data are read, the corresponding data type of all industry data in the history industry data is obtained;
The data type is detected, it is right when detecting the presence of the target data type for being not belonging to preset data type The corresponding Object Industry data of the target data type carry out data type conversion;
Using the history industry data after data type conversion as effective industry data.
3. method according to claim 2, which is characterized in that the history industry data using after data type conversion as The step of effective industry data, further includes:
Preset electronic table is written into history industry data after data type conversion, to obtain initial industry tables of data;
There are when missing values list item in detecting the initial industry tables of data, numerical value is carried out to the missing values list item and is filled out It fills, to obtain entire industries tables of data;
The numeric data stored in the entire industries tables of data is converted into the numeric data of default dimension to obtain effective production Industry tables of data, and using the data in the valid data table as effective industry data.
4. method as claimed in claim 3, which is characterized in that it is described in detecting the initial industry tables of data exist lack When mistake value list item, numerical value filling is carried out to the missing values list item, the step of to obtain entire industries tables of data, comprising:
There are when missing values list item in detecting the initial industry tables of data, the phase of the missing values list item column is obtained It is worth after value and adjacent list item before adjacent list item;
The average value being worth after value before the adjacent list item and the adjacent list item is calculated, and is lacked according to the average value to described Mistake value list item carries out numerical value filling, to obtain entire industries tables of data.
5. method as claimed in claim 4, which is characterized in that described to be carried out using clustering algorithm to effective industry data Clustering, with obtain it is different cluster classifications industry data to be associated the step of, comprising:
All industry data in effective industry data are clustered according to preset time granularity using clustering algorithm, are obtained Obtain cluster data;
The similarity in effective industry data between each category of industry is determined according to the cluster data;
The category of industry that the similarity is more than preset threshold is determined as same cluster classification, traverses the cluster data to obtain Take the industry data to be associated of different cluster classifications.
6. method as claimed in claim 5, which is characterized in that described to be based on the cluster classification for the industry number to be associated According to being converted into affairs type data, and inter-industry analysis is carried out to the affairs type data using association analysis algorithm, obtains and produce The step of industry association results, comprising:
The corresponding Transaction Identifier of each category of industry is generated based on the preset time granularity and the cluster classification;
The category of industry identified with same transaction is subjected to binding acquisition industry pair, and by the industry to preservation to number of transactions According to table;
Inter-industry analysis is carried out to the affairs type data stored in the Transaction Information table using association analysis algorithm, obtains and produces Industry association results.
7. such as method as claimed in any one of claims 1 to 6, which is characterized in that described to be constructed according to the Industrial Correlation result The step of industrial relations chain, comprising:
Corresponding promotion degree between each industry pair for including in the Industrial Correlation result is read, the industry centering includes at least Two category of industry;
Detect whether the promotion degree is more than preset value, determines the industry to for association industry pair if being more than;
The benchmark category of industry of the association industry centering is determined to corresponding support and confidence level according to the association industry With preposition category of industry;
Industrial relations chain is determined to corresponding benchmark category of industry and preposition category of industry according to each association industry.
8. a kind of industrial relations chain building device, which is characterized in that described device includes:
Data processing module pre-processes the history industry data effective to obtain for reading history industry data Industry data;
Data clusters module, it is different poly- to obtain for carrying out clustering to effective industry data using clustering algorithm The industry data to be associated of class classification;
Association analysis module, for converting affairs type data for the industry data to be associated based on the cluster classification, and Inter-industry analysis is carried out to the affairs type data using association analysis algorithm, obtains Industrial Correlation result;
Industrial Correlation module, for constructing industrial relations chain according to the Industrial Correlation result.
9. a kind of industrial relations chain building equipment, which is characterized in that the equipment includes: memory, processor and is stored in institute The industrial relations chain building program stated on memory and can run on the processor, the industrial relations chain building program are matched It is set to the step of realizing the industrial relations chain building method as described in any one of claims 1 to 7.
10. a kind of storage medium, which is characterized in that be stored with industrial relations chain building program, the production on the storage medium Industry relation chain construction procedures realize industrial relations chain building side as described in any one of claim 1 to 7 when being executed by processor The step of method.
CN201910548138.8A 2019-06-19 2019-06-19 Industrial relations chain building method, apparatus, equipment and storage medium Pending CN110378569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910548138.8A CN110378569A (en) 2019-06-19 2019-06-19 Industrial relations chain building method, apparatus, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910548138.8A CN110378569A (en) 2019-06-19 2019-06-19 Industrial relations chain building method, apparatus, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110378569A true CN110378569A (en) 2019-10-25

Family

ID=68250604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910548138.8A Pending CN110378569A (en) 2019-06-19 2019-06-19 Industrial relations chain building method, apparatus, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110378569A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209397A (en) * 2019-12-30 2020-05-29 中伯伦(北京)信息技术有限公司 Method for determining enterprise industry category
CN111784150A (en) * 2020-06-29 2020-10-16 中电工业互联网有限公司 System and method for generating industrial chain overview chart
CN112487021A (en) * 2020-11-26 2021-03-12 中国人寿保险股份有限公司 Correlation analysis method, device and equipment for business data
CN112487021B (en) * 2020-11-26 2024-04-30 中国人寿保险股份有限公司 Correlation analysis method, device and equipment of business data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090112825A1 (en) * 2007-10-31 2009-04-30 Nec (China) Co., Ltd Entity relation mining apparatus and method
CN105786860A (en) * 2014-12-23 2016-07-20 华为技术有限公司 Data processing method and device in data modeling
CN103353880B (en) * 2013-06-20 2017-03-15 兰州交通大学 A kind of utilization distinctiveness ratio cluster and the data digging method for associating
CN106802936A (en) * 2016-12-29 2017-06-06 桂林电子科技大学 A kind of data digging method based on item collection entropy
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107342976A (en) * 2017-05-18 2017-11-10 辛柯俊 For the mobile solution platform and method of enterprise's Analysis on Industry Chain
CN107609105A (en) * 2017-09-12 2018-01-19 电子科技大学 The construction method of big data accelerating structure
CN107844496A (en) * 2016-09-19 2018-03-27 阿里巴巴集团控股有限公司 Statistical information output intent and device
CN108334954A (en) * 2018-01-22 2018-07-27 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of Logic Regression Models

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090112825A1 (en) * 2007-10-31 2009-04-30 Nec (China) Co., Ltd Entity relation mining apparatus and method
CN103353880B (en) * 2013-06-20 2017-03-15 兰州交通大学 A kind of utilization distinctiveness ratio cluster and the data digging method for associating
CN105786860A (en) * 2014-12-23 2016-07-20 华为技术有限公司 Data processing method and device in data modeling
CN107844496A (en) * 2016-09-19 2018-03-27 阿里巴巴集团控股有限公司 Statistical information output intent and device
CN106802936A (en) * 2016-12-29 2017-06-06 桂林电子科技大学 A kind of data digging method based on item collection entropy
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107342976A (en) * 2017-05-18 2017-11-10 辛柯俊 For the mobile solution platform and method of enterprise's Analysis on Industry Chain
CN107609105A (en) * 2017-09-12 2018-01-19 电子科技大学 The construction method of big data accelerating structure
CN108334954A (en) * 2018-01-22 2018-07-27 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of Logic Regression Models

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
董红斌: "《协同演化算法及其在数据挖掘中的应用》", 31 July 2008, 中国水利水电出版社 *
赵炳新: "产业关联分析中的图论模型及应用研究", 《系统工程理论与实践》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209397A (en) * 2019-12-30 2020-05-29 中伯伦(北京)信息技术有限公司 Method for determining enterprise industry category
CN111784150A (en) * 2020-06-29 2020-10-16 中电工业互联网有限公司 System and method for generating industrial chain overview chart
CN112487021A (en) * 2020-11-26 2021-03-12 中国人寿保险股份有限公司 Correlation analysis method, device and equipment for business data
CN112487021B (en) * 2020-11-26 2024-04-30 中国人寿保险股份有限公司 Correlation analysis method, device and equipment of business data

Similar Documents

Publication Publication Date Title
CN109948641B (en) Abnormal group identification method and device
CN110457302B (en) Intelligent structured data cleaning method
Altuntas et al. Analysis of patent documents with weighted association rules
CN107622326B (en) User classification and available resource prediction method, device and equipment
CN109740642A (en) Invoice category recognition methods, device, electronic equipment and readable storage medium storing program for executing
CN111159428A (en) Method and device for automatically extracting event relation of knowledge graph in economic field
CN111143578A (en) Method, device and processor for extracting event relation based on neural network
CN106294128B (en) A kind of automated testing method and device exporting report data
CN115203167A (en) Data detection method and device, computer equipment and storage medium
CN110276382A (en) Listener clustering method, apparatus and medium based on spectral clustering
CN110378569A (en) Industrial relations chain building method, apparatus, equipment and storage medium
CN113538154A (en) Risk object identification method and device, storage medium and electronic equipment
CN107729330B (en) Method and apparatus for acquiring data set
JPWO2017158802A1 (en) Data conversion system and data conversion method
CN105589900A (en) Data mining method based on multi-dimensional analysis
CN109144999B (en) Data positioning method, device, storage medium and program product
CN114139490B (en) Method, device and equipment for automatic data preprocessing
CN109614416A (en) A kind of invoice management method and device based on data statistic analysis
CN113642291B (en) Method, system, storage medium and terminal for constructing logical structure tree reported by listed companies
CN114021716A (en) Model training method and system and electronic equipment
KR102110350B1 (en) Domain classifying device and method for non-standardized databases
CN110765100B (en) Label generation method and device, computer readable storage medium and server
CN115617790A (en) Data warehouse creation method, electronic device and storage medium
CN112732891A (en) Office course recommendation method and device, electronic equipment and medium
CN111723129A (en) Report generation method, report generation device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191025