CN116502755A - System and method for industrial development prediction by constructing industrial map by adopting big data - Google Patents

System and method for industrial development prediction by constructing industrial map by adopting big data Download PDF

Info

Publication number
CN116502755A
CN116502755A CN202310467143.2A CN202310467143A CN116502755A CN 116502755 A CN116502755 A CN 116502755A CN 202310467143 A CN202310467143 A CN 202310467143A CN 116502755 A CN116502755 A CN 116502755A
Authority
CN
China
Prior art keywords
industry
industrial
dominant
module
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310467143.2A
Other languages
Chinese (zh)
Inventor
林文棋
吴梦荷
郝新华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Thupdi Planning Design Institute Co ltd
Original Assignee
Beijing Thupdi Planning Design Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Thupdi Planning Design Institute Co ltd filed Critical Beijing Thupdi Planning Design Institute Co ltd
Priority to CN202310467143.2A priority Critical patent/CN116502755A/en
Publication of CN116502755A publication Critical patent/CN116502755A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Manufacturing & Machinery (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a system and a method for predicting industrial development by constructing an industrial map by adopting big data, wherein the system comprises the following modules: the basic data processing module is used for processing data, including acquisition, arrangement, storage, grouping and summarization; the industrial space comprehensive processing module is used for inputting indexes to obtain the dominant industrial association strength between industries; the industrial network relation module generates an industrial map according to the dominant industrial association strength; the industry prediction module obtains a potential dominant industry vertex in a graph calculation mode and generates a future industry development prediction directory; and the evaluation screening module calculates indexes of all industries in the prediction list through an industry evaluation index system and screens ideal prediction results. The invention also discloses a corresponding using method of the system. The invention realizes the whole-course digital tool support required by the industry planning, can clearly determine the upstream and downstream and quantitative relation between the industries and identify the industry clusters, and provides scientific support for the industry transformation upgrading path.

Description

System and method for industrial development prediction by constructing industrial map by adopting big data
Technical Field
The invention belongs to the technical field of big data prediction, and particularly relates to a system and a method for predicting industrial development by constructing an industrial map by adopting big data.
Background
Along with the rapid development of the social situation, the industrial transformation and upgrading become one of the core requirements of urban development, and the direction and timing selection of the industrial transformation still lack a scientific and efficient pushing mechanism, so that huge losses and wastes can be caused by the industrial development deviation and hysteresis caused by the scientific and efficient pushing mechanism. At the same time, the rapidly changing international situation also presents more and all new challenges to the industry development across the area, which requires a more macroscopic view and more comprehensive consideration of the area in the industry development options.
The solutions for the current local industry planning mainly include two methods: one is to make industry selections based on the current industry hotspots, but in this way local conditions are easily ignored, and the impractical selection of industry results in sustainable shortage of industry development, resource waste, and missing of development opportunities for other industries. And the other is to construct an industrial development planning idea with strong implementation by absorbing and summarizing professional knowledge of industrial planning and compiling departments, related industrial practitioners and related departments and utilizing the professional knowledge and experience of the industrial planning and compiling departments. Compared with the former method, the method is more in line with the local practical condition and is easier to implement, but the dependence on related responsible persons is too high, and the method still lacks rational, quantitative analysis and scientific support, and has huge industrial span due to wide industrial scope, so that the capability scope of professionals is difficult to cover the whole industry.
Therefore, an application system and a method capable of providing scientific support for local industry development are needed, macroscopic development venation of the whole industry is carded based on existing data, time and space elements are fused, upstream and downstream cooperative relations among industries are quantized, industrial development paths are defined, and industrial clusters are identified, so that the carding of an industrial structure is carried out according to local specific industrial development conditions, and scientific industry transformation upgrading path references are obtained.
Disclosure of Invention
The invention provides a system and a method for predicting industrial development by constructing an industrial map by adopting big data, which can be used for overcoming the defects and shortcomings that the conventional industrial planning is too subjective in the decision process, the planning selection result is difficult to obtain scientific support and the like, and comprehensively and systematically quantifying the conventional industrial structure and recommending the industrial development direction according to the prediction.
The invention discloses a system for predicting industrial development by constructing an industrial map by adopting big data, which obtains the industrial map according to the industrial data and predicts the future dominant industry based on the industrial map, and mainly comprises the following modules: the system comprises a basic data processing module, an industrial space comprehensive processing module, an industrial network relation module, an industrial prediction module and an evaluation screening module, wherein:
The basic data processing module is used for processing data, including acquiring, arranging and storing industrial basic data, and grouping and summarizing the data; the basic data processing module acquires the registration detail data of the industrial and commercial enterprises, constructs industrial basic data by combining regional basic information after cleaning the data, and then groups and gathers the content of the industrial basic data according to research scales, wherein the research scales comprise two dimensions of time and space;
the industrial space comprehensive processing module is used for inputting corresponding indexes to obtain dominant industrial association strength among industries; the industrial space comprehensive processing module uses grouping summarized data in the basic data processing module, calculates and obtains regional dominant industry according to the industrial basic data, and obtains dominant industry association strength according to the regional dominant industry;
the industrial network relation module screens the relation among industries according to the dominant industry association strength obtained by the industrial space comprehensive processing module by using a minimum spanning tree and threshold demarcation mode, reserves core data for constructing an industrial map, and performs map visualization through a complex network visualization method to generate the industrial map;
The industry prediction module is used for generating a future industry development prediction directory; the industry prediction module obtains potential dominant industry peaks through a graph calculation mode based on the established industry map of the industry network relation module 3 according to the regional dominant industry obtained in the industry space comprehensive processing module, and generates a future industry development prediction directory;
the evaluation and screening module is used for screening future industrial prediction results meeting the index requirements, and evaluating and screening the future industrial development prediction directory obtained in the industrial prediction module through various indexes in the industrial evaluation index system to obtain the final result of industrial development prediction.
The invention also discloses a method for constructing an industrial development prediction system by adopting big data, which comprises the following steps:
s01, basic data processing is carried out, and a basic data processing module is used for data processing in a selected time range and a selected space range; selecting space unit groups, classes of industry class groups and enterprise indexes;
the basic data processing module firstly obtains original basic data; the original basic data comprises enterprise basic information and region basic information; the obtained original basic data are put into an industry basic database;
S02, obtaining a comprehensive association result between industries by using an industrial space comprehensive processing module; according to the input index, using an industrial space comprehensive processing module to obtain the dominant industrial association strength between industries; the method comprises the following specific substeps:
s0201, regional advantage industry demarcation; according to the enterprise index selected in S01, the regional advantage industry is obtained by using a dominance formula, wherein the dominance formula is as follows:
RCA c,i =R c,i /R ∑c,i
wherein c represents a region; i represents industry; x represents the production factor index of the specified measurement industry; e, e c The economic scale correction coefficient for region c; GDP (GDP) c GDP value representing region c; GDP (GDP) all The GDP set is a set of GDP of all regions after dividing the regions according to space units in the whole country, and comprises GDP values of each region; maxGDP all minGDP is the maximum in the GDP set all Is the minimum in the GDP set;for the normalized GDP value of region c, when GDP is c =minGDP all When in use, GDP c Take on a value 1.01 times the minimum value of GDP, i.e. GDP c =minGDP all *1.01, post-calculation +.> The value range of (1) is (0, 1)];d cq Representing the adjacency coefficient between the region c and the region q surrounding the region c; x (c, i) is an index value of the industry i in region c; x (q, i) is an index value of i industry in q region, x (q, i) is 1/d cq Multiplying an index value of an i industry of a q region by the reciprocal of an adjacency coefficient between a c region and the q region to obtain a geographic weighting index value of a surrounding q region, wherein the q region is a region adjacent to the c region within the adjacency coefficient upper limit value;
x(c,i)+∑ q x(q,i)*1/d cq Index weighted statistics representing i industry in region c;
c (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing nationwide i industries;
i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all industries in region c;
c,i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all nationwide industries; r is R c,i The ratio of the index statistical value of the i industry in the c area to the index weighted statistical value of all industries in the c area is obtained; r is R ∑c,i The ratio of index statistical value of i industry in the whole country to index weighted statistical value of all industries in the whole country; RCA (RCA) c,i Dominance of the industry in region i;
according to a preset dominance threshold V c Determining dominant industries of all areas, when the dominance degree is greater than a threshold value V c In this case, it is determined that the industry is dominant;
s0202, calculating the dominant industry association strength; and calculating the inter-industry connection strength based on the calculation result of the dominant industry of each area, and using conditional probability to represent the inter-industry connection strength, wherein the specific formula is as follows:
e scale =|(x i -x j )/x total |
wherein P (o) j ) Indicating that under the condition that industry i is the dominant industry,industry j is the correction conditional probability of the dominant industry; p (j) i ) Under the condition that the industry j is the dominant industry, the industry i is the correction condition probability of the dominant industry; x is x i I is the number of industrial nationwide enterprises; x is x j The number of enterprises in j industries nationwide; x is x total Is the national enterprise number; e, e scale The industrial scale correction coefficient is the absolute value of the difference between the national enterprise quantity ratio of i industry and the national enterprise quantity ratio of j industry; e, e dif The industrial difference correction coefficient;representing the probability that two industries of ij are dominant industries in one region at the same time, namely the association strength between the industries i and j; />The number of areas for which industry i and industry j are dominant industries at the same time;for industry i is the number of areas of dominant industry, +.>The number of areas where industry j is the dominant industry;
s03, constructing an industrial network relation; an industrial network relation is built by using an industrial network relation module, a minimum spanning tree and threshold value demarcating mode is adopted, the relation between industries is screened according to the dominant industry association strength obtained by the industrial space comprehensive processing module, core data for building an industrial map is reserved, map visualization is carried out by a complex network visualization method, and an industrial map is generated;
s04, carrying out industry prediction by using an industry prediction module; according to the regional advantage industry obtained in the industrial space comprehensive processing module, a potential advantage industry vertex is obtained in a graph calculation mode based on the established industry map of the industrial network relation module, and a future industry development prediction directory is generated; the method comprises the following specific substeps:
S0401, taking the regional dominant industry obtained in the step S0201 as a basic industry vertex, adding or deleting the regional dominant industry on the basis of the dominant industry by a user, and taking the dominant industry after modification and confirmation as a basic industry vertex for simulation prediction;
s0402, obtaining a future industrial development prediction directory; setting an industry transformation development condition by a user, namely inputting an industry connection threshold value, iteration times or industry connection steps, and based on an industry prediction basic vertex, expanding the national industry map generated in the step S03 outwards from the basic industry vertex to obtain a potentially associated potentially advantageous industry vertex, and generating a future industry development prediction directory, wherein the industry in the future industry development prediction directory is called a future industry;
s05, screening future industries by using an evaluation screening module to obtain final industries, and displaying the final industries as industrial development prediction results; the method comprises the following specific substeps:
s0501, a user selects an industrial development index combination to be used, or uses a default industrial development index combination, and uses an industrial evaluation index system module to calculate various industrial development indexes for future industries;
s0502, after comprehensive sorting is performed according to the index results, screening to obtain the final industry, and displaying the final industry as an industry development prediction result.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention is used as a complete application system, and meets the whole-process digital tool support required by industrial planning from the application angle, such as the construction of modules of original data cleaning treatment, data calculation, map drawing, industrial simulation, result evaluation screening and the like;
2. according to the invention, through industrial relation research and industrial space-time data mining of the whole industry, the industrial development rule can be objectively reflected to provide scientific support for the industrial transformation upgrading path;
3. by means of characteristic identification and analysis of the industrial map, the invention can clearly determine the upstream and downstream and quantitative relation among industries and identify the industrial clusters, thereby providing scientific support for industrial transformation upgrading paths and finally providing a scientific and practical solution for local industrial development and positioning.
Drawings
FIG. 1 is a schematic diagram of a system for industrial development prediction by constructing an industrial map using big data;
FIG. 2 is a nationwide industrial map of 2000 produced by an embodiment of the present invention;
FIG. 3 is a nationwide industrial map of 2005 produced by an embodiment of the present invention;
FIG. 4 is a nationwide industrial map of 2010 produced by an embodiment of the present invention;
FIG. 5 is a 2015 national industrial map generated by an embodiment of the invention;
FIG. 6 is a 2015 annual production map of A city generated in accordance with an embodiment of the present invention;
FIG. 7 is an industry transformation development map of the 2015 dominant industry of A city generated in the example of the present invention;
FIG. 8 is a graph of industry transformation development of the dominant industry in Guangzhou City 2010, generated by an embodiment of the present invention;
FIG. 9 is a prediction directory of Guangzhou market industry development recommended by the algorithm of the embodiment of the invention;
FIG. 10 is a comparison of recommended and non-recommended industry growth rates for the algorithm of the present invention;
FIG. 11 is a flowchart of a method for using the industrial development prediction system according to the present invention using big data to construct an industrial map.
Detailed Description
In order that the system implementation of the present invention may be more readily understood, a detailed description of the various application system modules of the present invention and the use of the steps will be described below in connection with a specific embodiment. It should be noted that, for the practitioners in the field to which the present invention relates, the operating steps, values and various condition settings of the present invention can be reasonably adjusted according to the actual use requirements, without affecting the interpretation and scientificity of the final result. The following description of the embodiments is mainly provided by way of example to assist in understanding the usage and utility of each application system module of the present invention, and is not limited to the usage scenario and effect of the present invention.
The invention discloses a system for predicting industrial development by constructing an industrial map by adopting big data, which mainly comprises the following modules: a basic data processing module 1, an industrial space comprehensive processing module 2, an industrial network relation module 3, an industrial prediction module 4 and an evaluation screening module 5, as shown in fig. 1.
The basic data processing module 1 is used for constructing an industrial basic database by combining national administrative division space distance data and local economic data after extracting industrial and commercial enterprise registration detail data to obtain core data content related to enterprise production and cleaning the data, and then grouping and summarizing the content of the industrial basic database according to a research scale, wherein the research scale comprises two dimensions of time and space.
The basic data processing modules include an original basic data acquisition sub-module 11, an industry basic database 12, an existing enterprise count statistics sub-module 13, a space cell group summarization sub-module 14, an industry category group summarization sub-module 15, and an enterprise index statistics sub-module 16.
The raw base data acquisition sub-module 11 is configured to acquire raw base data, and place the acquired raw base data into the industry base database 12. The raw base data includes business base information and region base information. The enterprise basic information is derived from the business departments, can be selected according to a specified time range and a space range, the time range can select any time peak in a statistical range, and the data of the current business enterprise registration details are from 1980 to date, so that the time peak is in units of years, the time range is from 1980 to the specified year, the space range selects national enterprises, the data is accurate to a single enterprise, and a plurality of fields including enterprise industry information, business range information, space information, scale information and registration state of the enterprises are covered. The region basic information is data which is stored when the system is established, and is generally updated once a year by the original basic data acquisition sub-module 11, and comprises at least a region name, space projection coordinates of a region geometric center, a regional annual average GDP and an adjacent region name.
The industry base database 12 is used to store raw base data, including: enterprise basic information table and region basic information table. The enterprise basic information table at least comprises an enterprise name, an enterprise registration address, a registration date, an industry field of the enterprise, different indexes to be used according to follow-up, and fields of tax, production value, main income, employment number or the like of a designated year, wherein the contents of the fields can be obtained from industrial and commercial data. The enterprise registration address is sequentially stored according to a provincial administrative district, a regional administrative district, a county administrative district, a rural administrative district and a specific address in the registration address; the industry field of the enterprise is divided into four levels according to the industrial and commercial data, namely industry categories, major categories, middle categories and minor categories, and the industry of the enterprise is respectively stored according to the corresponding level classification. Such as: the enterprise operation range is "manufacture acrylic fiber", the classification according to the industrial and commercial data shows that the industry gate is the manufacturing industry, the major class is the chemical fiber manufacturing industry, the middle class is the synthetic fiber manufacturing, the minor class is the acrylic fiber manufacturing, and the industry of the enterprise needs to store all four grades in the industrial and commercial data for subsequent data statistics. The region basic information table comprises a region name, a region average-person GDP and a neighboring region name field, wherein the region average-person GDP is the average-person GDP value displayed in the local statistics annual book of the corresponding region, and the neighboring region name is the region name adjacent to the region.
The data cleaning sub-module 13 is configured to remove invalid and erroneous business enterprise registration data from the original basic data, thereby obtaining industrial basic data. The data cleaning may clean the original basic data according to the conditions that the enterprise registration address information is null or invalid, the industry field information of the enterprise is null or invalid, the industry of the enterprise and the registration address information do not reach the due accuracy, etc., and the industrial basic data is stored in the industrial basic database 12 after cleaning.
The spatial unit grouping summary sub-module 14 is used to obtain existing business groupings and index summaries for a given year for a given spatial unit. The space unit is divided according to administrative region levels, such as provincial administrative regions, regional administrative regions, county administrative regions, and the like, and the industry base data is grouped according to the selected space unit and the enterprise registration address field. Considering that the enterprise grouping result taking province and city level as space is too wide, and the county level has too large subsequent data processing amount and low meaning, and the county level administrative area is a comparatively ideal space unit scale, the invention takes county level administrative area as default space unit selection.
The industry category grouping and summarizing sub-module 15 is configured to group the enterprises in each space unit according to the selected industry level and the industry category of the level, so as to obtain the existing enterprise grouping and index summarizing of the specified year of each industry category. The industry classes are classified into four classes, namely industry classes, major classes, medium classes and minor classes, and considering more accurate industry analysis results, the industry class classification method can help to improve the accurate efficiency of industry tenderers, and large differences exist among the industry minor classes, so that the industry minor classes are recommended as default industry class classification scales.
It is specifically noted herein that the spatial unit group summary sub-module 14 and the industry category group summary sub-module 15 may be used separately or in combination, such as: firstly grouping and summarizing space units, and then respectively grouping and summarizing industry categories for each grouped space unit; or firstly grouping and summarizing the industry categories, and then grouping and summarizing the space units of the grouped industry categories; or only group summaries of spatial units or industry categories.
The enterprise index statistics sub-module 16 is configured to perform statistics on the packets obtained by the space unit packet summarization sub-module 14 and the industry category packet summarization sub-module 15 according to specified indexes, where the indexes include: one of a business number, tax, a production value, a main income, or a employment number, etc. If the specified index is the number of businesses, then the business index statistics sub-module 16 uses the data in the industry base database 12 to derive a model of the number of businesses existing for the specified year from the different groupings as:
X Exist =X Register -X Cancle -X Withdraw
wherein X is Exist Representing the number of existing businesses in a given year, X Register Indicating the number of businesses registered to a given year calendar,X Cancle represents the number of businesses logged off by the specified years and X Withdraw Representing the number of businesses to be revoked in the past year of a specified year, X Register 、X Cancle And X Withdraw The values of (a) can be obtained statistically in the industry base database 12 after grouping, if the specified year is 2015, then X Register That is, enterprise statistics registered before month 1 and day 1 of 2016. In this way, in combination with the space unit group summarization sub-module 14 and the industry category group summarization sub-module 15, various statistics of the specified space unit and industry category in the specified year can be obtained.
The industrial space comprehensive processing module 2 is configured to obtain a comprehensive correlation result between industries according to a specified index, and includes a regional dominant industry determination sub-module 21 and a dominant industry correlation strength calculation sub-module 22.
The regional advantage industry determination submodule 21 uses the dominance degree to determine a regional advantage industry. The dominance of the invention is calculated by an improved algorithm based on the zone bit entropy. The regional entropy is used for comparing and calculating the development level of a certain production element in a certain place in all local production elements and the development level of the comprehensive productivity of the certain place in the whole country, so that the aggregation degree of the development level of the certain production element in the certain place in the whole country is obtained, and the higher the aggregation degree is, the more advantageous the production element in the whole country is. The regional entropy algorithm can eliminate the interference of factors such as different industrial structures and different industrial development scales in different regions in theory, but in actual use, the development level of certain production elements cannot be objectively reflected through the calculation of the regional entropy due to the fact that some regions are limited by regional space, so that the final dominance evaluation result is deviated. In order to overcome the problem, the invention introduces a geographical weighting concept to the regional entropy, and the production elements of other regional units around the region are given to the region through the spatial overflow weighting by calculating the spatial overflow weight between any two cities, so that the ratio of the production elements of the region can more objectively reflect the real proportion of the concentration level of the production elements of the region in the region, and finally, the dominance algorithm is obtained. And calculating the industrial dominance of each space unit to obtain the industrial dominance condition of each region. The method is concretely realized as follows: the regional advantage industry determination submodule 21 calls the enterprise index statistics submodule 16 to obtain index statistics required by calculation of the dominance according to the specified index, and the dominance is specifically expressed as follows:
RCA c,i =R c,i /R ∑c,i
Wherein c represents a region; i represents industry; x represents a specified production factor index of the measured industry, namely, an index specified by the enterprise index statistics sub-module 16, which is called an enterprise index, and can be one of enterprise quantity, tax, production value, main income or employment number; e, e c For the economic scale correction coefficient of region c, the normalized reciprocal of GDP of region cCalculating to obtain; GDP (GDP) c The GDP value of the c area is expressed, and the GDP value of the designated year is directly obtained from the regional average person GDP item of the regional basic information table; GDP (GDP) all Is a set of GDP of all regions after dividing regions by space unit in the whole country, called GDP set, including GDP value of each region, maxGDP all minGDP is the maximum in the GDP set all Is the minimum in the GDP set; />GDP value for normalized region c, due to +.>The value range cannot be 0, so when GDP c =minGDP all When in use, GDP c Take on a value 1.01 times the minimum value of GDP, i.e. GDP c =minGDP all *1.01, post-calculation +.> The value range of (1) is (0, 1)];d cq The adjacency coefficient between the region c and the region q surrounding the region c is generally 1 if the region c and the region q are directly adjacent, 2 if the region c and the region q are adjacent via the region p, and the adjacency coefficient upper limit value is set to 5; x (c, i) is an index value of the i industry in the c region, and is obtained by adding index values of all enterprises of the i industry in the c region by statistics; x (q, i) is an index value of i industry in q region, x (q, i) is 1/d cq The geographical weighting index value of the surrounding q region is obtained by multiplying the index value of the i industry of the q region by the reciprocal of the adjacent coefficient between the c region and the q region, wherein the q region is adjacent to the c region within the upper limit value of the adjacent coefficient.
x(c,i)+∑ q x(q,i)*1/d cq Index weighted statistics representing i industry in region c;
c (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing nationwide i industries;
i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all industries in region c;
c,i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all nationwide industries; r is R c,i The ratio of the index statistical value of the i industry in the c area to the index weighted statistical value of all industries in the c area is obtained; r is R ∑c,i The ratio of index statistical value of i industry in the whole country to index weighted statistical value of all industries in the whole country; RCA (RCA) c,i Is the dominance of the industry in the region c.
According to a preset dominance threshold V c Determining dominant industries in various regions, typically threshold V c Can be set as the 1,2 or 5 percentile, according to the dominance calculation formula, when the dominance is greater than the threshold V c In this case, the industry is judged to be dominant.
Wherein R is c,i As the dominance judgment result of the regional c industry i, RCA H(c,i) Indicating regional c industry i as the dominant industry, RCA L(c,i) Indicating that regional c industry i is a non-dominant industry, RCA c,i And calculating a result of dominance of the regional c industry i.
The dominant industry association strength calculation sub-module 22 is configured to calculate the inter-industry association strength based on the calculation result of the dominant industry in each region, and characterize the inter-industry association strength using the modified conditional probability. The modified conditional probability is applied to calculate the probability of coexistence of two industries, i.e., the probability that two industries are dominant industries in one region at the same time, multiplied by the industry scale modification factor and the industry variance modification factor. The specific formula is as follows:
wherein P (i) j ) The correction condition probability that the industry j is the dominant industry is expressed under the condition that the industry i is the dominant industry; p (j) i ) Under the condition that the industry j is the dominant industry, the industry i is the correction condition probability of the dominant industry; x is x i I is the number of industrial nationwide enterprises; x is x j The number of enterprises in j industries nationwide; x is x total Is the national enterprise number; e, e scale The industrial scale correction coefficient is the absolute value of the difference between the national enterprise quantity ratio of i industry and the national enterprise quantity ratio of j industry; e, e dif For the industry difference correction coefficient, if the industry i, j is 1 with the coefficient belonging to one industry subclass, 2 with the coefficient belonging to one industry class and not belonging to one industry subclass, 3 with the coefficient belonging to one industry subclass and not belonging to one industry class, 4 with the coefficient belonging to one industry subclass and not belonging to one industry class, and 5 with the coefficient not belonging to the same industry subclass; Representing the probability that two industries of ij are dominant industries in one region at the same time, namely the connection strength between the industry i and the industry j;the number of areas for which industry i and industry j are dominant industries at the same time; />For industry i is the number of areas of dominant industry, +.>Is the number of areas where industry j is the dominant industry.
The industry network relation module 3 screens and builds the relation between industries by using a mode of MST (minimum spanning tree) and threshold definition, reserves core data which can be used for building an industry map, and performs map visualization generation through a complex network visualization method, wherein the method comprises the following steps:
the industry-related core data screening module 31 screens industry-related core data in a manner of MST (minimum spanning tree); through MST (minimum spanning tree), all industries can be guaranteed to be contained in the map, representative inter-industry connection is reserved, and the generated map is a connected map, namely, any other industry can be reached from any one industry.
The industry-related core data screening module 31 in this embodiment uses Prim algorithm, and there are many MST (minimum spanning tree) algorithms, such as Prim, kruskal. The general idea of Prim algorithm is: starting from any vertex, selecting one vertex with the largest edge weight with the current vertex at a time, adding an edge between the two vertices into a tree, and specifically implementing the following steps in the industry association core data screening module 31:
The first step: the weighted connected graph comprises a vertex set V and an edge set E; wherein the elements in V are the full industry names under the selected industry level in the industry category grouping and summarizing sub-module 15, and the elements in E are the industry association strength obtained by the dominant industry association strength calculation sub-module 22; the greater the industrial connection strength is, the larger the weight value of the edge is; firstly, traversing each vertex in the connected graph, selecting 2 edges with the largest weight of each vertex, storing the edges into an edge set { e1}, independently storing the edge set { e1} after traversing, deleting the edge set { e1} from the weighted connected graph, and taking the rest weighted connected graphs { V, e0} as main implementation objects of the subsequent steps.
And a second step of: optionally selecting a vertex V from the vertex set V of the graph { V, e0}, a vertex V 1 Marked as visit and put into the point set { v }, as starting point visit -select and point v 1 Edge e connected and having the greatest edge weight 1 Marked visit and placed into the edge set { e visit Then e 1 Vertex v of the other end 2 Also labeled visit and put into the set of points { v ] visit In }, both selected vertices and edges are labeled visitProving that the vertex and the edge have added the minimum spanning tree; point set { v visit Sum of edges { e } visit All initially empty.
And a third step of: selecting the edge with the largest weight and the vertex at the other end connected with the edge according to the weight of the edge connected with the marked visit vertex from the rest vertexes, and marking the selected vertexes and edges as visit; the vertex and edge marked visit are respectively put into the point set { v } visit Sum of edges { e } visit }。
Fourth step: repeating the third step until all points in the vertex set V are marked as visit; because all points in vertex set V are marked as visit and placed in point set V visit In { v }, so the point set { v } visit The set of vertices V, i.e., the set of points V visit The same as vertex set V.
Fifth step: adding the edge set { e1} extracted in the first step on the basis of the fourth step, namely marking edges in the edge set { e1} as visit and adding the edge set { e } visit Then continue to add to the edge set { e by the visit tag from big to small according to the weights of the remaining edges visit Until the number of edges and vertices reaches a specified ratio or link strength threshold, at which point the set of points { v } visit "and edge set { e } (also referred to as vertex set V) visit Graph { v } co-constructed visit ,e visit The MST (minimum spanning tree) is established and can be expressed as { V, e } visit }。
All vertex and edge data in the MST (minimum spanning tree) is the industry-related core data.
The threshold demarcation module 32 is used for determining a threshold of the connection strength, and by setting the built-in threshold of the adjustment module, the stronger inter-industry connection is reserved, and the best effect of the industry connection data is ensured by demarcating the threshold. In the embodiment, iterative calculation is adopted to enable the atlas to reach an ideal scale with the number of edges being 3 times that of the vertexes, and the threshold of the connection strength is determined. In this embodiment, the iteration of the graph refers to starting from the vertex and expanding according to the weight of the edge. The iterative calculation is specifically that after all points are marked as visit, all industrial connection intensity values are ordered, after the weight of all industrial connection intensity values is removed from the existing edges, the map is made to reach an ideal scale that the number of the edges is 3 times that of the vertexes, and the industrial connection intensity value of the last edge is the determined connection intensity threshold value. The threshold definition module 32 is closely related to the industry-associated core data screening module 31, and the threshold of the association strength may be determined by the MST (minimum spanning tree) edge and the number of vertices, or the MST (minimum spanning tree) may be obtained by the threshold of the association strength.
The industrial map visualization module 33 uses a Force guidance algorithm (Force Atlas algorithm) to visualize the inter-industrial relationship based on the screened industrial association core data, and finally constructs an industrial map. The industry map visualization module 33 can visualize the inter-industry relationship through the force-guided layout function in the gephi software, and finally construct an industry map.
The industry prediction module 4 obtains the development direction of industry transformation by adopting a graph calculation mode, specifically: the existing industry basic data is input, and the possible industry transformation development direction of the current industry basic is simulated based on the established industry map by the industrial map visualization module 33 in a map calculation mode.
The industry simulation basic data confirmation module 41 is used for displaying the dominant industry in the designated area obtained from the dominant industry association strength calculation sub-module 22, allowing the user to add or delete the dominant industry, and confirming the confirmed dominant industry as the simulated industry prediction basic data. The specified regional advantage industry obtained from the advantage industry correlation strength calculation sub-module 22 may be a year specified when the industry map is created, or may be a year newly specified, and the specified regional advantage industry may be obtained by recalculating the advantage industry correlation strength calculation sub-module 22.
The industry simulation calculation module 42 is configured to obtain a future industry development prediction directory, specifically: based on the industrial prediction basic data obtained by the industrial simulation basic data confirmation module 41, the industrial mapping generated by the industrial network relation module 3 is expanded outwards by taking the industrial basic data as a basic industrial vertex in a way of graphic calculation, and the industrial prediction basic data is used for searching and calculating potential dominant industrial vertices possibly associated by setting an industrial connection threshold value, iteration times and industrial connection steps, so that a future industrial development prediction directory is generated according to the industry on the potential dominant industrial vertex, and the industry in the future industrial development prediction directory is called as the future industry. The industry association threshold, the iteration number and the industry association step number can adopt a direct designated mode or a default mode, the industry association threshold defaults to 0, the iteration number defaults to 1 and the industry association step number defaults to 1. If the iteration number is 2, the industry association step number is 2, the industry association threshold is not selected, if the industry association threshold defaults to 0, the edge with the largest weight is selected by the basic industry vertexes to expand, and because the industry association threshold defaults to 0, the edge is not needed to be screened, all the industry vertexes with the step numbers of 1 and 2 are potential dominant industry vertexes, then the edge with the largest weight is continuously selected to expand by taking the potential dominant industry vertexes as vertexes, and all the industry vertexes with the step numbers of 1 and 2 are potential dominant industry vertexes. According to the number of the existing dominant industries in the designated area, 2 times of the number of the existing dominant industries is selected for display, and the minimum display number is 5.
The evaluation and screening module 5 is used for screening future industries through an industry evaluation index system to obtain final industries. The evaluation screening module 5 includes an industry evaluation index system module 51 and an industry development prediction result screening module 52.
The industrial evaluation index system module 51 includes various industrial evaluation indexes, in this embodiment, the evaluation indexes such as industrial production efficiency, tight centrality, industrial density, industrial attraction are used for evaluating the development space and potential of the industry. These industrial evaluation indexes may be used in their entirety, or some indexes may be selected by the user, and the default value is to use all industrial evaluation indexes.
The industrial evaluation index system in the invention takes an industrial map as a core, the industrial evaluation indexes shown below are only schematic, and other existing industrial evaluation indexes can be added according to actual needs.
PRODY (production efficiency) is represented by a weighted average of the weight of industry i in each region multiplied by the average GDP of people in each region. The higher the PRODY of an industry, the greater the contribution of that industry to the local economy. The formula is as follows:
wherein Y is c Is the people average GPD of region c; x is x c,i The index value indicating the regional c industry i may be the industry RCA calculated by the industry number and regional advantage determination submodule 21 c,i (industrial dominance value), primary income and other various data; x is X c The index value sum of all industries in the region c.
Closeness Centrality (tight centrality) is one of the metrics characterizing the centrality of vertices in complex network analysis, i.e. the difficulty of a vertex to reach other vertices, i.e. the reciprocal of the average of the industrial distances from the industry to all other industries in the industry map. The larger the index value, the better the value, and the closer the industry is to the center of the network, the easier the industry is to transform with other industries.
Density (industry Density) is an index used to comprehensively measure the average proximity of a future industry to an industry currently developing in the area, i.e., the capacity of all industries around the perimeter of the industry under established conditions of the industry set existing or advantageous in the area. The larger the index value, the better the value, the more successful products are developed around the product, and the future development potential of the product is also stronger. The specific formula is as follows:
wherein w is c,i Representing the industry density of region c industry i; m is M ci Indicating whether the region c industry i is a dominant industry, if so, the region c industry i is 1, otherwise, the region c industry i is 0;(indicating proximity between industries. The index can be seen as a potential industry and its surroundings on a map A weighted average proximity value for the industry that reflects the cumulative capacity endowment size around the potential industry i. The greater the Density value of an industry, the more advantageous the industry is around the industry, and the greater the likelihood that the industry will develop into a more advantageous industry in the future, and vice versa.
Industry appeal, which indicates industry appeal to cities. The higher the index value, the better, the higher the value, the more attractive the industry is to the city, otherwise the less attractive. The specific formula is as follows:
in the formula, PRODY c,i Representing the production efficiency of the region c industry i; EXPY (extract of PY) c Representing the overall production efficiency of region c; alpha c,i The attractive force of industry i to region c is shown.
The industrial development prediction result screening module 52 performs comprehensive sorting according to the index results, and screens to obtain a final industrial development prediction result. The comprehensive ranking may be performed in various ways, such as assigning different weight values to the respective indicators, and then ranking by calculating the comprehensive indicators.
The invention also discloses a use method of the system for carrying out industrial development prediction by constructing an industrial map by adopting big data. The following specifically describes a method for using the system for predicting industrial development by constructing an industrial map using big data, taking the city of a certain province as an example. The market A hopes to define the industry subdivision direction by adjusting the industry structure so as to promote the high-quality development of the city. But the market A has the dilemma of undefined production system, limited industrial transformation, weak attraction and investment capability, imperfect industrial supporting elements and the like, and needs to improve the technological innovation capability, exert the self advantage and accelerate the industrial transformation and upgrading. Therefore, the invention can know the change of the industry development direction through the analysis of the whole change rule of the national industry on one hand, provides reference for the selection of the whole industry development direction of the A market, can help the A market to more accurately identify the key strategy and path of the industry development on the other hand, and performs the work of analyzing the current situation of the industry, comparing the regional industry, simulating the decision of the industry development direction and the like. As shown in fig. 11, the method specifically comprises the following steps:
S01, basic data processing.
After the selected time range and the selected space range are input, the basic data processing module 1 is used for data processing, and the space unit group, the class of the industry class group and the enterprise index are selected.
S0101, acquiring original data.
The raw basic data acquiring sub-module 11 is used to acquire the registration data of the industrial and commercial enterprises, the data is from the industrial and commercial departments, the time range can select any time period or vertex in the statistical range, for example, in order to more comprehensively analyze the industrial development of the A market, four time vertices of 2000, 2005, 2010 and 2015 are selected, and the spatial range selects the national enterprises. The data is accurate to a single enterprise, and various fields such as industry information, operation range information, space information, scale information, registration state and the like of the enterprise are covered. The raw base data is stored in an enterprise base information table in the industry base database 12. The industry base database 12 also includes a region base information table, and the contents of the region base information table are stored in the database in advance. And then the data cleaning sub-module 13 is called to delete invalid and wrong business enterprise registration data in the enterprise basic information table to obtain industrial basic data.
S0102, selecting the grades of the space unit group and the industry category group respectively.
The classification level is selected for the space unit grouping and summarizing sub-module 14 and the industry category grouping and summarizing sub-module 15 respectively, the default level is used in the application, the space unit grouping and summarizing sub-module 14 selects a district of administrative level, and the industry category grouping and summarizing sub-module 15 selects an industry subclass. And grouping and summarizing according to the levels of the selected space unit groups and the industry category groups to form structured data.
S0103, selecting enterprise indexes.
According to the plurality of indexes provided by the enterprise index statistics sub-module 16, one index is selected as an index for the subsequent statistics using the enterprise index statistics sub-module 16, wherein the index includes the enterprise number, tax, production value, main income, or employment number. The index selected in this embodiment is the number of enterprises.
S02, obtaining a comprehensive association result between industries by using an industrial space comprehensive processing module.
The regional dominant industry determination sub-module 21 and the dominant industry association strength calculation sub-module 22 are used to obtain a comprehensive association result between industries.
S0201, regional advantage industry definition.
The regional advantage industry is obtained using the regional advantage industry determination submodule 21 according to the index selected in S0103. In this embodiment, the specified index is the number of enterprises, so the variables related to the index in the dominance calculation formula are all specifically the number of enterprises. The regional advantage industry determination submodule 21 obtains the sum of the number of enterprises of the industry i in the region c, the number of enterprises of all industries in the region c, the number of enterprises of the industry i in the whole country and the number of enterprises of all industries in the whole country by calling the space unit grouping and summarizing submodule 14, the industry category grouping and summarizing submodule 15 and the enterprise index statistics submodule 16. And then determining the dominance screening threshold of the dominant industry by selecting the 5 th percentile, thereby determining the category of the dominant industry in each place.
S0202, calculating the dominant industry association strength.
The inter-industry association strength is calculated using the dominant industry association strength calculation sub-module 22 and the conditional probabilities are used to characterize the inter-industry association strength.
S03, constructing an industrial network relation.
The industrial network relation is constructed by using an industrial network relation module 3, the relation between industries is screened and constructed by adopting a mode of MST (minimum spanning tree) +threshold definition, core data which can be used for constructing an industrial map is reserved, and the map is visualized and generated by a complex network visualization method.
S0301, screening to obtain industry associated core data by using the industry associated core data screening module 31.
S0302, determining a contact strength threshold using a thresholding module 32.
In this example, when the graph reaches an ideal scale with 3 times the number of edges as the number of vertices, the threshold of the link strength is determined to be 0.512.
S0303, creating a visualized industrial map using the industrial map visualization module 33.
The visualized industry map is based on the core association data of the screened industry map, the peaks in the map represent the full industry names (all industry names in the industry subclass level set in S0102), and the edges in the map represent the industry association strength (phi calculated in S0202) i,j )。
As shown in fig. 2 to 5, corresponding national industry maps are generated according to four time vertices of year 2000, year 2005, year 2010 and year 2015 selected in S0101.
S04, performing industry prediction using the industry prediction module 4.
S0401, using the industry simulation basic data confirmation module 41 to confirm the dominant industry, using the dominant industry as a basic industry vertex, the user adds or deletes the dominant industry based on the dominant industry displayed by the industry simulation basic data confirmation module 41, and after confirmation, using the confirmed dominant industry as a basic industry vertex for simulation prediction.
And selecting and displaying 2015A dominant industry according to the A dominant industry calculated in the step S0201, and adding and deleting the dominant industry on the basis. The identified dominant industry can be displayed in a national industry map, as shown in fig. 6, to obtain a dominant industry map of market a in 2015.
S0402, the industrial simulation diagram calculation module 42 is used for obtaining a future industrial development prediction list.
The user sets the industry transformation development condition, inputs the industry association threshold, the iteration number, or the number of industry association steps, and uses the industry simulation calculation module 42 to predict the base vertices based on the industry, expanding outward from the base industry vertices in the national industry map.
In this embodiment, the user only inputs the industry association threshold value greater than or equal to 0.65 and the number of industry association steps is 2, so the iteration number is 1 according to the default value, and based on this, the industry simulation diagram calculation module 42 is used to obtain potentially advantageous industry vertices in possible association, and generate a future industry development prediction directory, where the industry in the future industry development prediction directory is called the future industry.
S05, screening future industries by using the evaluation screening module 5 to obtain final industries.
Comprises an industry evaluation index system module 51 and an industry development prediction result screening module 52.
S0501, various industrial development indexes are obtained using the industrial evaluation index system module 51.
The user selects an industry development index combination to be used, or uses a default industry development index combination, and obtains a corresponding industry development index using the industry evaluation index system module 51.
S0502, using the industry development prediction result screening module 52 to screen future industries, and obtaining industry development prediction results.
And (3) comprehensively sorting according to the index results, screening to obtain a final industry, and displaying the final industry as an industry development prediction result, wherein the display result is shown in fig. 7.
In order to prove that the invention can truly reflect the global and local industrial structures and development trends, and can effectively predict and evaluate the possible transformation development direction of each industry in the region. The industrial evolution of Guangzhou markets 2010 and 2015 is used as a demonstration case, an industrial development prediction directory is identified based on the current industrial development state atlas of Guangzhou markets 2010, and the industrial development prediction directory is compared with the actual condition of the industrial development of Guangzhou 2015 to perform algorithm effect verification.
Firstly, generating a corresponding national industrial map in 2010 according to steps S1-S3, then, in step S4, inputting the number of industrial contact steps to be 1-6 based on the dominant industry in Guangzhou in the current year to obtain future industries, displaying the industrial peaks obtained in different contact steps by using different colors for more clearly representing the future industries obtained according to the number of industrial contact steps, and finally, screening according to step S5 to obtain the industrial development prediction result in Guangzhou, wherein the industrial development prediction result contains 35 industries which are not dominant industries, and the accuracy is classified into national economy industry classification subclasses. Fig. 8 shows the prediction results of the industrial development of guangzhou in the 2010 national industrial map obtained by using the present application.
According to the actual industry economic data of Guangzhou city, the growth condition of the industry recommended by the algorithm is verified in 2010-2015, as shown in fig. 9, all the 35 industries screened are grown, and about 50% of industries have been developed into dominant industries in 2015. As shown in fig. 10, the growth rate of the algorithmically recommended industry was generally higher than that of other industries not recommended, as compared to the non-algorithmically recommended industry, where the growth rate of the industry was higher than average, exceeding 267% of the total industry.
Proved by verification, the system and the method for predicting the industrial development by constructing the industrial map by adopting big data can relatively objectively simulate and analyze the industrial development process and is relatively close to the actual development condition of places.
Finally, it should be noted that: the embodiments described above are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The system for predicting the industrial development by constructing an industrial map by adopting big data is characterized by obtaining the industrial map according to different space-time industrial data and predicting the future dominant industry based on the industrial map, and comprises the following modules: the system comprises a basic data processing module, an industrial space comprehensive processing module, an industrial network relation module, an industrial prediction module and an evaluation screening module, wherein:
the basic data processing module is used for processing data, including acquiring, arranging and storing industrial basic data, and grouping and summarizing the data; the basic data processing module acquires the registration detail data of the industrial and commercial enterprises, constructs industrial basic data by combining regional basic information after cleaning the data, and then groups and gathers the content of the industrial basic data according to research scales, wherein the research scales comprise two dimensions of time and space;
the industrial space comprehensive processing module is used for inputting corresponding indexes to obtain dominant industrial association strength among industries; the industrial space comprehensive processing module uses grouping summarized data in the basic data processing module, calculates and obtains regional dominant industry according to the industrial basic data, and obtains dominant industry association strength according to the regional dominant industry;
The industrial network relation module screens the relation among industries according to the dominant industry association strength obtained by the industrial space comprehensive processing module by using a minimum spanning tree and threshold demarcation mode, reserves core data for constructing an industrial map, and performs map visualization through a complex network visualization method to generate the industrial map;
the industry prediction module is used for generating a future industry development prediction directory; the industrial prediction module obtains potential dominant industrial peaks through a graph calculation mode based on the industrial map established by the industrial network relation module according to the regional dominant industry obtained in the industrial space comprehensive processing module, and generates a future industrial development prediction directory;
the evaluation and screening module is used for screening future industrial prediction results meeting the index requirements, and evaluating and screening the future industrial development prediction directory obtained in the industrial prediction module through various indexes in the industrial evaluation index system to obtain the final result of industrial development prediction.
2. The system for industrial development prediction using big data to construct an industrial map of claim 1, wherein the base data processing module comprises an original base data acquisition sub-module, an industrial base database, an existing enterprise count statistics sub-module, a spatial unit grouping summary sub-module, an industry category grouping summary sub-module, and an enterprise index statistics sub-module;
The original basic data acquisition sub-module is used for acquiring original basic data and placing the acquired original basic data into an industrial basic database; the original basic data comprises enterprise basic information and region basic information; the basic information of the enterprise is derived from the industry and commerce departments, and the data is accurate to a single enterprise, and at least comprises the industry information, the operating range information, the space information, the scale information and the registration state of the enterprise; the region basic information at least comprises a region name, a region average GDP and adjacent region names;
the industry base database is used for storing original base data, and comprises: an enterprise basic information table and a region basic information table; the enterprise basic information table at least comprises an enterprise name, an enterprise registration address, a registration date and an industry field to which the enterprise belongs; the region basic information table at least comprises a region name, a region average GDP and a field;
the data cleaning submodule is used for deleting invalid and wrong enterprise data in the original basic data to obtain industrial basic data;
the space unit grouping and summarizing submodule is used for obtaining existing enterprise grouping and index summarizing of the specified year of the specified space unit;
the industry category grouping and summarizing submodule is used for grouping enterprises according to the selected industry level and the industry category of the level to obtain existing enterprise grouping and index summarizing of the designated year of each industry category;
The enterprise index statistics sub-module uses industrial basic data in an industrial basic database, and uses a space unit grouping summarization sub-module and an industry category grouping summarization sub-module to obtain groups, and the groups are counted according to the designated indexes; the indexes comprise: one of the number of businesses, tax, value, primary revenue, or employment.
3. The system for industrial development prediction using big data to construct an industrial map according to claim 1, wherein the industrial space comprehensive processing module comprises a regional dominant industry determination sub-module and a dominant industry correlation strength calculation sub-module;
the regional advantage industry determination submodule determines a regional advantage industry using the dominance degree; the regional advantage industry determination submodule firstly calls an enterprise index statistics submodule to obtain index statistics values required by the calculation of the dominance, and then carries out the calculation of the dominance, wherein the specific formula of the dominance is as follows:
RCA c,o =R c,i /R ∑c,i
wherein c represents a region; i represents industry; x represents the production factor index of the specified measurement industry; e, e c The economic scale correction coefficient for region c; GDP (GDP) c GDP value representing region c; GDP (GDP) all The GDP set is a set of GDP of all regions after dividing the regions according to space units in the whole country, and comprises GDP values of each region; maxGDP all minGDP is the maximum in the GDP set all Is the minimum in the GDP set;for the normalized GDP value of region c, when GDP is c =minGDP all When in use, GDP c Take on a value 1.01 times the minimum value of GDP, i.e. GDP c =minGDP all *1.01, and thenCalculate->Thus->The value range of (1) is (0, 1)];d cq Representing the adjacency coefficient between the region c and the region q surrounding the region c; x (c, i) is an index value of the industry i in region c; x (q, i) is an index value of i industry in q region, x (q, i) is 1/d cq Multiplying an index value of an i industry of a q region by the reciprocal of an adjacency coefficient between a c region and the q region to obtain a geographic weighting index value of a surrounding q region, wherein the q region is a region adjacent to the c region within the adjacency coefficient upper limit value;
x(c,i)+∑ q x(q,i)*1/d cq index weighted statistics representing i industry in region c;
c (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing nationwide i industries;
i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all industries in region c;
c,i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all nationwide industries; r is R c,i The ratio of the index statistical value of the i industry in the c area to the index weighted statistical value of all industries in the c area is obtained; r is R ∑c,i The ratio of index statistical value of i industry in the whole country to index weighted statistical value of all industries in the whole country; RCA (RCA) c,i Dominance of the industry in region i;
according to a preset dominance threshold V c Determining dominant industries of all areas, when the dominance degree is greater than a threshold value V c In this case, it is determined that the industry is dominant;
wherein R is c,i Dominant judgment for region c industry iBreaking result, RCA H(c,i) Indicating regional c industry i as the dominant industry, RCA L(c,i) Indicating that regional c industry i is a non-dominant industry, RCA c,i The dominance calculation result of the region c industry i;
the dominant industry association strength calculation sub-module is used for calculating the inter-industry association strength based on the calculation result of the dominant industry of each area, and the inter-industry association strength is represented by using conditional probability, namely the probability that two industries are dominant industries in one area at the same time, and the specific formula is as follows:
e scale =|(x i -x j )/x total |
in the method, in the process of the invention,the dominant industry association strength between industry i and industry j is represented, namely the probability that industry i and industry j are dominant industries in one region at the same time; p (i) j ) The correction condition probability that the industry j is the dominant industry is expressed under the condition that the industry i is the dominant industry; p (j) i ) Under the condition that the industry j is the dominant industry, the industry i is the correction condition probability of the dominant industry; x is x i I is the number of industrial nationwide enterprises; x is x j The number of enterprises in j industries nationwide; x is x total Is the national enterprise number; e, e sxale The industrial scale correction coefficient is the absolute value of the difference between the national enterprise quantity ratio of i industry and the national enterprise quantity ratio of j industry; e, e dif For the industry difference correction coefficient, if the coefficient of the industry i, j belonging to one industry subclass is 1, the coefficient belonging to one industry subclass and not belonging to one industry subclass is 2, the coefficient belonging to one industry subclass and not belonging to one industry class is 3, the coefficient belonging to one industry subclass and not belonging to one industry class is 4, and the coefficient not belonging to the same industry class is 5;the number of areas for which industry i and industry j are dominant industries at the same time; />For industry i is the number of areas of dominant industry, +.>Is the number of areas where industry j is the dominant industry.
4. The system for industrial development prediction using big data to construct an industrial map of claim 1, wherein the industrial network relationship module comprises: the system comprises an industry association core data screening module, a threshold value demarcating module and an industry map visualization module;
the industrial association core data screening module screens industrial association core data in a minimum spanning tree mode; the method comprises a vertex set V and an edge set E; wherein the elements in V are the full industry names under the industry level selected in the industry category grouping and summarizing submodule, and the elements in E are the industry connection strength obtained by the dominant industry connection strength calculation submodule; the reserved vertexes and edges in the minimum spanning tree are the industry-related core data;
The threshold value demarcation module is used for determining a connection strength threshold value, so that only edges with industry connection strength larger than the connection strength threshold value are reserved in the minimum spanning tree;
the industrial map visualization module is used for visualizing the mutual connection among industries by adopting a force guiding algorithm based on the industrial associated core data screened by the industrial associated core data screening module, and finally an industrial map is constructed.
5. The system for industrial development prediction by using big data to construct industrial atlas according to claim 4, wherein the specific way for the industrial associated core data screening module to screen the industrial associated core data by using the minimum spanning tree method is as follows:
the first step: the weighted connected graph comprises a vertex set V and an edge set E; the elements in V are the full industry names under the industry level selected in the industry category grouping and summarizing submodule, the elements in E are the industry connection strength obtained by the dominant industry connection strength calculation submodule, and the industry connection strength value is the weight of the edge; firstly traversing each vertex in the connected graph, selecting 2 edges with the largest weight of each vertex, storing the edges into an edge set { e1}, independently storing the edge set { e1} after traversing, deleting the edge set { e1} from the weighted connected graph, and taking the rest weighted connected graphs { V, e0} as implementation objects of subsequent steps;
And a second step of: optionally selecting a vertex V from a set of vertices V of the weighted connected graph { V, e0} 1 Marked as visit and put into the point set { v }, as starting point visit -select and point v 1 Edge e connected and having the greatest edge weight 1 Marked visit and placed into the edge set { e visit Then e 1 Vertex v of the other end 2 Also labeled visit and put into the set of points { v ] visit In }, both the selected vertices and edges are labeled visit; and a third step of: selecting the edge with the largest weight and the vertex at the other end connected with the edge according to the weight of the edge connected with the marked visit vertex from the rest vertexes, and marking the selected vertexes and edges as visit; the vertex and edge marked visit are respectively put into the point set { v } visit Sum of edges { e } visit };
Fourth step: the third step is repeated until all points in the vertex set V are marked as visit, i.e., the point set { V } visit The same as vertex set V;
fifth step: on the basis of the fourth step, all edges in the edge set { e1} are marked as visit and the edge set { e } is added visit Then continue to add to the edge set { e by the visit tag from big to small according to the weights of the remaining edges visit Until the number of edges and vertices reaches a specified ratio or link strength threshold, at which point the set of points { v } visit Sum of edges { e } visii Graph { v } co-constructed visit ,e visit The method comprises the steps of establishing a completed minimum spanning tree;
the data of all vertices and edges in the minimum spanning tree is the industry-related core data.
6. The system for industrial development prediction by constructing an industrial map using big data according to claim 1, wherein the industrial prediction module comprises an industrial simulation basic data confirmation module and an industrial simulation map calculation module;
the industrial simulation basic data confirmation module is used for displaying the dominant industry in the appointed area obtained from the dominant industry association strength calculation sub-module, allowing a user to add or delete the dominant industry and confirm the dominant industry, and taking the dominant industry as simulated industrial prediction basic data;
the industrial simulation diagram calculation module is used for obtaining a future industrial development prediction directory, and specifically comprises the following steps: based on the industrial prediction basic data obtained by the industrial simulation basic data confirming module, in a mode of drawing calculation in an industrial map generated by the industrial network relation module, industrial vertexes are expanded outwards by taking the industrial basic data as basic industrial vertexes, potential dominant industrial vertexes possibly related to the industrial connection are searched and calculated by setting an industrial connection threshold value, iteration times and industrial connection steps, a future industrial development prediction list is generated according to the industries represented by the potential dominant industrial vertexes, and the industries in the future industrial development prediction list are called future industries.
7. The system for industrial development prediction by constructing an industrial map with big data according to claim 1, wherein the evaluation screening module comprises an industrial evaluation index system module and an industrial development prediction result screening module;
the industry evaluation index system module comprises various industry evaluation indexes for evaluating the development space and potential of the industry;
and the industrial development prediction result screening module screens to obtain a final industrial development prediction result after comprehensively sequencing according to the industrial evaluation index results.
8. The system for industrial development prediction using big data to construct an industrial map according to claim 7, wherein the industrial evaluation index comprises at least one of industrial production efficiency, tight centrality, industrial density, and industrial attraction.
9. A method for constructing an industrial development prediction system by using big data according to claim 1, comprising the following steps:
s01, basic data processing is carried out, and a basic data processing module is used for data processing in a selected time range and a selected space range; selecting space unit groups, classes of industry class groups and enterprise indexes;
The basic data processing module firstly obtains original basic data; the original basic data comprises enterprise basic information and region basic information; the obtained original basic data are put into an industry basic database;
s02, obtaining a comprehensive association result between industries by using an industrial space comprehensive processing module; according to the input index, using an industrial space comprehensive processing module to obtain the dominant industrial association strength between industries; the method comprises the following specific substeps:
s0201, regional advantage industry demarcation; according to the enterprise index selected in S01, the regional advantage industry is obtained by using a dominance formula, wherein the dominance formula is as follows:
RCA c,i =R c,i /R ∑c,i
wherein c represents a region; i represents industry; x represents the production factor index of the specified measurement industry; e, e c The economic scale correction coefficient for region c; GDP (GDP) c GDP value representing region c; GDP (GDP) all The GDP set is a set of GDP of all regions after dividing the regions according to space units in the whole country, and comprises GDP values of each region; maxGDP all minGDP is the maximum in the GDP set all Is the minimum in the GDP set;for the normalized GDP value of region c, when GDP is c =minGDP all When in use, GDP c Take on a value 1.01 times the minimum value of GDP, i.e. GDP c =minGDP all *1.01, post-calculation +.> The value range of (1) is (0, 1) ];d cq Representing the adjacency coefficient between the region c and the region q surrounding the region c; x (c, i) is an index value of the industry i in region c; x (q, i) is an index value of i industry in q region, x (q, i) is 1/d cq The index value of the i industry of q region is multiplied by the reciprocal of the adjacent coefficient between c region and q regionObtaining geographic weighting index values of surrounding q regions, wherein the q regions are regions adjacent to the c region within the upper limit value of the adjacent coefficient;
x(c,i)+∑ q x(q,i)*1/d cq index weighted statistics representing i industry in region c;
c (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing nationwide i industries;
i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all industries in region c;
c,i (x(c,i)+∑ q x(q,i)*1/d cq ) Index weighted statistics representing all nationwide industries; r is R c,i The ratio of the index statistical value of the i industry in the c area to the index weighted statistical value of all industries in the c area is obtained; r is R ∑c,i The ratio of index statistical value of i industry in the whole country to index weighted statistical value of all industries in the whole country; RCA (RCA) c,i Dominance of the industry in region i;
according to a preset dominance threshold V c Determining dominant industries of all areas, when the dominance degree is greater than a threshold value V c In this case, it is determined that the industry is dominant;
s0202, calculating the dominant industry association strength; and calculating the inter-industry connection strength based on the calculation result of the dominant industry of each area, and using conditional probability to represent the inter-industry connection strength, wherein the specific formula is as follows:
e scale =|(x i -x j )/x total |
Wherein P (i) j ) The correction condition probability that the industry j is the dominant industry is expressed under the condition that the industry i is the dominant industry; p (j) i ) Under the condition that the industry j is the dominant industry, the industry i is the correction condition probability of the dominant industry; x is x i I is the number of industrial nationwide enterprises; x is x j The number of enterprises in j industries nationwide; x is x total Is the national enterprise number; e, e scale The industrial scale correction coefficient is the absolute value of the difference between the national enterprise quantity ratio of i industry and the national enterprise quantity ratio of j industry; e, e dif The industrial difference correction coefficient;representing the probability that two industries of ij are dominant industries in one region at the same time, namely the association strength between the industries i and j; />The number of areas for which industry i and industry j are dominant industries at the same time; />For industry i is the number of areas of dominant industry, +.>The number of areas where industry j is the dominant industry;
s03, constructing an industrial network relation; an industrial network relation is built by using an industrial network relation module, a minimum spanning tree and threshold value demarcating mode is adopted, the relation between industries is screened according to the dominant industry association strength obtained by the industrial space comprehensive processing module, core data for building an industrial map is reserved, map visualization is carried out by a complex network visualization method, and an industrial map is generated;
S04, carrying out industry prediction by using an industry prediction module; according to the regional advantage industry obtained in the industrial space comprehensive processing module, a potential advantage industry vertex is obtained in a graph calculation mode based on the established industry map of the industrial network relation module, and a future industry development prediction directory is generated; the method comprises the following specific substeps:
s0401, taking the regional dominant industry obtained in the step S0201 as a basic industry vertex, adding or deleting the regional dominant industry on the basis of the dominant industry by a user, and taking the dominant industry after modification and confirmation as a basic industry vertex for simulation prediction;
s0402, obtaining a future industrial development prediction directory; setting an industry transformation development condition by a user, namely inputting an industry connection threshold value, iteration times or industry connection steps, and based on an industry prediction basic vertex, expanding the national industry map generated in the step S03 outwards from the basic industry vertex to obtain a potentially associated potentially advantageous industry vertex, and generating a future industry development prediction directory, wherein the industry in the future industry development prediction directory is called a future industry;
s05, screening future industries by using an evaluation screening module to obtain final industries, and displaying the final industries as industrial development prediction results; the method comprises the following specific substeps:
S0501, a user selects an industrial development index combination to be used, or uses a default industrial development index combination, and uses an industrial evaluation index system module to calculate various industrial development indexes for future industries;
s0502, after comprehensive sorting is performed according to the index results, screening to obtain the final industry, and displaying the final industry as an industry development prediction result.
10. A method for industrial development prediction system using big data construction industry map according to claim 9, wherein step S01 specifically comprises the following sub-steps:
s0101, acquiring original data; after inputting the selected time range and space range, acquiring the registration detail data of the industrial and commercial enterprises, cleaning the data, and constructing industrial basic data by combining the regional basic information;
s0102, respectively selecting the grades of space unit grouping and industry category grouping; grouping and summarizing the industry basic data according to the input space unit grade and industry class grade;
s0103, selecting enterprise indexes; one index is selected from the plurality of indexes for subsequent statistics and calculations, the enterprise index including at least one of a number of enterprises, tax, a value of production, revenue of campaigns, or employment.
CN202310467143.2A 2023-04-27 2023-04-27 System and method for industrial development prediction by constructing industrial map by adopting big data Pending CN116502755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310467143.2A CN116502755A (en) 2023-04-27 2023-04-27 System and method for industrial development prediction by constructing industrial map by adopting big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310467143.2A CN116502755A (en) 2023-04-27 2023-04-27 System and method for industrial development prediction by constructing industrial map by adopting big data

Publications (1)

Publication Number Publication Date
CN116502755A true CN116502755A (en) 2023-07-28

Family

ID=87324276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310467143.2A Pending CN116502755A (en) 2023-04-27 2023-04-27 System and method for industrial development prediction by constructing industrial map by adopting big data

Country Status (1)

Country Link
CN (1) CN116502755A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663751A (en) * 2023-07-31 2023-08-29 北京市科学技术研究院 Three-network industry map construction method and system based on future industry enterprises
CN117709514A (en) * 2023-11-24 2024-03-15 武汉索元数据信息有限公司 Regional industry structure optimization method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663751A (en) * 2023-07-31 2023-08-29 北京市科学技术研究院 Three-network industry map construction method and system based on future industry enterprises
CN117709514A (en) * 2023-11-24 2024-03-15 武汉索元数据信息有限公司 Regional industry structure optimization method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Monteiro et al. An urban building database (UBD) supporting a smart city information system
Moghadam et al. Urban energy planning procedure for sustainable development in the built environment: A review of available spatial approaches
CN116502755A (en) System and method for industrial development prediction by constructing industrial map by adopting big data
Long et al. Mapping block-level urban areas for all Chinese cities
Nault et al. Development and test application of the UrbanSOLve decision-support prototype for early-stage neighborhood design
JP5284548B2 (en) Profiling system using regional characteristics
CN107330734B (en) Co-location mode and ontology-based business address selection method
Zeng et al. Revisiting the modifiable areal unit problem in deep traffic prediction with visual analytics
Barreira-González et al. Configuring the neighbourhood effect in irregular cellular automata based models
Uhl et al. A century of decoupling size and structure of urban spaces in the United States
Liao et al. An overview of fuzzy multi-criteria decisionmaking methods in hospitality and tourism industries: bibliometrics, methodologies, applications and future directions
CN109615414A (en) House property predictor method, device and storage medium
CN109978264B (en) Urban population distribution prediction method based on spatio-temporal information
Yang et al. Street network or functional attractors? Capturing pedestrian movement patterns and urban form with the integration of space syntax and MCDA
Cui et al. GIS-based method of delimitating trade area for retail chains
Tanton Spatial microsimulation: developments and potential future directions
Gilardi et al. A nonseparable first-order spatiotemporal intensity for events on linear networks: An application to ambulance interventions
KR20100123408A (en) System for providing a marketing information to a company based on gis and the method thereof
CN117746546A (en) Service business handling method and system based on number calling device
CN110348685B (en) Urban industrial land space interference degree evaluation method, system, equipment and medium
CN114004661A (en) Store information processing method, device, equipment and storage medium
Hynes et al. Spatial Microsimulation for Regional Analysis of Marine Related Employment
CN116485143B (en) Space planning processing method based on population density big data
Naotunna A Model for the Estimation of Land Prices in Colombo District using Web Scraped Data
Elessa Etuman et al. OLYMPUS-POPGEN: A synthetic population generation model to represent urban populations for assessing exposure to air quality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination