CN107301471A - The accurate Forecasting Methodology of industrial trend and its system based on big data - Google Patents

The accurate Forecasting Methodology of industrial trend and its system based on big data Download PDF

Info

Publication number
CN107301471A
CN107301471A CN201710423129.7A CN201710423129A CN107301471A CN 107301471 A CN107301471 A CN 107301471A CN 201710423129 A CN201710423129 A CN 201710423129A CN 107301471 A CN107301471 A CN 107301471A
Authority
CN
China
Prior art keywords
industry
data
module
storehouse
segmented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710423129.7A
Other languages
Chinese (zh)
Inventor
李小强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianhai Sycamore (shenzhen) Data Co Ltd
Original Assignee
Qianhai Sycamore (shenzhen) Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianhai Sycamore (shenzhen) Data Co Ltd filed Critical Qianhai Sycamore (shenzhen) Data Co Ltd
Priority to CN201710423129.7A priority Critical patent/CN107301471A/en
Publication of CN107301471A publication Critical patent/CN107301471A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation, e.g. linear programming, "travelling salesman problem" or "cutting stock problem"

Abstract

The present invention relates to the accurate Forecasting Methodology of the industrial trend based on big data and its system, this method includes obtaining magnanimity business data;According to magnanimity business data, the segmented industry to enterprise is positioned;According to the segmented industry and magnanimity business data, Microscopic Indexes storehouse is obtained;Life cycle index storehouse is set up for each segmented industry;For the sample industry for needing to predict, Microscopic Indexes storehouse forecast sample industry development trend up or down is utilized;For the sample industry for needing to predict, life cycle residing for life cycle index storehouse forecast sample industry is utilized.The present invention is by obtaining magnanimity business data, with reference to the precise positioning of the segmented industry of industry, form Microscopic Indexes storehouse, the development trend of Microscopic Indexes storehouse prediction industry up or down, sets up life cycle index storehouse, predicts the life cycle of industry, realize the degree of accuracy of lifting prediction and analysis, independent control can be carried out for some industries, the defect of macroscopical industry prediction analysis can be made up, generalization degree is high.

Description

The accurate Forecasting Methodology of industrial trend and its system based on big data
Technical field
The present invention relates to industrial trend Forecasting Methodology, more specifically refer to that the industrial trend based on big data is precisely predicted Method and its system.
Background technology
Industry generally refers to it and divided by production like product or with same process process or the service of offer same sort of labour Economic activity classification, such as catering industry, apparel industry, machinery industry, financial industry, mobile Internet industry, to row Industry is presently in the analysis of state, mainly the status from the whole industry of industrial trend analyzing.
At present, to industrial trend mainly analyzed by government or professional institution, but be all macroscopic view analysis with Prediction, the degree of accuracy of trade division first is not high enough, can only be specific to some based on the macroscopic analysis of the macro-indicators such as power consumption Industry such as manufacturing industry is analyzed and predicted that the air port industry to some investment temperatures is difficult to hold, and there is certain limitation Property.
Chinese patent 201410398967.X discloses a kind of electric power towards representative row sparetime university data and handles exponential trend Forecasting Methodology, including (1) are screened using correlation analysis to the index for influenceing trade power consumption amount;(2) finger to filtering out Mark is tested and optimized using statistical check and rough set theory, sets up industry electric power monitoring Index System Model;(3) propose The structure and Forecasting Methodology of index (Electrical Manager'sIndex, abbreviation EMI) are handled based on electric power, and based on electricity Power handles exponential forecasting trade power consumption trend.The patent can monitor trade power consumption dynamic change, analyze its status and Its development trend is predicted, the rule of electricity market supply and demand change is grasped, reliable foundation is provided for Marketing of Power Market decision-making.
But, above-mentioned patent is also to remove analysis industry status from power consumption and predict its development trend, so Accurately it can only can not be analyzed and predicted for some new industries for typical industry.
Therefore, it is necessary to design a kind of accurate Forecasting Methodology of the industrial trend based on big data, realize lifting prediction and divide The degree of accuracy of analysis, can carry out independent control for some industries, can make up the defect of macroscopical industry prediction analysis, generalization degree It is high.
The content of the invention
It is an object of the invention to overcome the defect of prior art, there is provided the accurate side of prediction of the industrial trend based on big data Method and its system.
To achieve the above object, the present invention uses following technical scheme:The accurate side of prediction of industrial trend based on big data Method, methods described includes:
Obtain magnanimity business data;
According to the magnanimity business data, the segmented industry to enterprise is positioned;
According to the segmented industry and magnanimity business data, Microscopic Indexes storehouse is obtained;
Life cycle index storehouse is set up for each segmented industry;
For the sample industry for needing to predict, predict that the sample industry develops to become up or down using Microscopic Indexes storehouse Gesture;
For the sample industry for needing to predict, Life Cycle residing for the sample industry is predicted using life cycle index storehouse Phase.
Its further technical scheme is:According to the magnanimity business data, the step that the segmented industry to enterprise is positioned Suddenly, including in detail below step:
According to magnanimity business data, sector database is set up;
Using the related consultancy website of the full name search of enterprise, search returned content is obtained;
The search returned content is precisely analyzed, crucial participle is obtained;
The crucial participle is matched, classified statistics and labelled, the segmented industry of enterprise is formed.
Its further technical scheme is:According to the segmented industry and magnanimity business data, Microscopic Indexes storehouse is obtained Step, including step in detail below:
According to the segmented industry, the magnanimity business data is sorted out;
According to each dimension of the segmented industry, the magnanimity business data after classification is analyzed;
The business data of each dimension of each segmented industry of classified statistics;
Quantify the business data of each dimension of each segmented industry, obtain each segmented industry each dimension it is leading Index;
Integrate all leading indicators, composition Microscopic Indexes storehouse.
Its further technical scheme is:For the sample industry for needing to predict, the sample is predicted using Microscopic Indexes storehouse Industry up or down development trend the step of, including step in detail below:
The sample industry for needing to predict is obtained, and time series data is extracted out of sample industry data;
The time series data is inquired about in the Microscopic Indexes storehouse, and to the time series data vectorization;
According to the frequency of time series data described in TID classified statistics;
To TID hierarchical indexs;
Count the average, standard deviation and quantile of the time series data;
Linear regression prediction is carried out to the frequency of the time series data, average, standard deviation and quantile.
Its further technical scheme is:The frequency of the time series data, average, standard deviation and quantile are carried out linear The step of regressing calculation, including step in detail below:
Subdivided modeling is carried out to the frequency of the time series data, average, standard deviation and quantile, regression model is obtained;
Error rate, R side and the regression coefficient of each model are assessed, the figure of each model is drawn;
The prediction of development trend up or down is carried out to each model, and preservation predicts the outcome.
Present invention also offers the accurate forecasting system of the industrial trend based on big data, including data capture unit, industry Positioning unit, Microscopic Indexes storehouse acquiring unit, life cycle index storehouse set up unit, prediction of the development trend unit and Life Cycle Phase predicting unit;
The data capture unit, for obtaining magnanimity business data;
The trade orientation unit, for according to the magnanimity business data, the segmented industry to enterprise to be positioned;
Microscopic Indexes storehouse acquiring unit, for according to the segmented industry and magnanimity business data, obtaining microcosmic Index storehouse;
The life cycle index storehouse sets up unit, for setting up life cycle index storehouse for each segmented industry;
The prediction of the development trend unit, for the sample industry for needing to predict, institute is predicted using Microscopic Indexes storehouse State sample industry development trend up or down;
The life cycle predicting unit, it is pre- using life cycle index storehouse for the sample industry for needing to predict Survey life cycle residing for the sample industry.
Its further technical scheme is:The trade orientation unit includes Database module, content obtaining module, divided Word acquisition module and processing module;
The Database module, for according to magnanimity business data, setting up sector database;
The content obtaining module, for using the related consultancy website of the full name search of enterprise, obtaining search returned content;
The participle acquisition module, for precisely being analyzed the search returned content, obtains crucial participle;
The processing module, for being matched to the crucial participle, classified statistics and labels, and forms enterprise The segmented industry.
Its further technical scheme is:Microscopic Indexes storehouse acquiring unit includes classifying module, analysis module, packet system Count module, quantization modules and integrate module;
The classifying module, for according to the segmented industry, sorting out to the magnanimity business data;
The analysis module, for each dimension according to the segmented industry, enters to the magnanimity business data after classification Row analysis;
The classified statistics module, the business data for each dimension of each segmented industry of classified statistics;
The quantization modules, the business data of each dimension for quantifying each segmented industry obtains each subdivision row The leading indicators of each dimension of industry;
The integration module, all leading indicators for integrating, composition Microscopic Indexes storehouse.
Its further technical scheme is:The prediction of the development trend unit includes time series data preparation module, data processing Module, Frequency statistics module, hierarchical index module, parametric statistics module and regression forecasting module;
The time series data preparation module, for obtaining the sample industry for needing to predict, and out of, sample industry data Extract time series data;
The data processing module, for inquiring about the time series data in the Microscopic Indexes storehouse, and to the sequential Data vector;
The Frequency statistics module, for the frequency according to time series data described in TID classified statistics;
The hierarchical index module, for TID hierarchical indexs;
The parametric statistics module, average, standard deviation and quantile for counting the time series data;
The regression forecasting module, for entering line to the frequency of the time series data, average, standard deviation and quantile Property regression forecasting.
Its further technical scheme is:The regression forecasting module includes modeling submodule, assesses submodule and preservation Predict submodule;
The modeling submodule, for being grouped to the frequency of the time series data, average, standard deviation and quantile Modeling, obtains regression model;
The assessment submodule, error rate, R side and regression coefficient for assessing each model draw each model Figure;
It is described to preserve prediction submodule, for carrying out the prediction of development trend up or down to each model, and preserve Predict the outcome.
Compared with the prior art, the invention has the advantages that:The industrial trend based on big data of the present invention is precisely predicted Method, by obtaining magnanimity business data, with reference to the precise positioning of the segmented industry of industry, for the different dimensions of the segmented industry Magnanimity business data is sorted out, counted and quantified, Microscopic Indexes storehouse is formed, Microscopic Indexes storehouse is used to predict that industry is upward Or downward development trend, the life cycle index storehouse of the segmented industry is set up, the life cycle of industry is predicted, lifting prediction is realized With the degree of accuracy of analysis, independent control can be carried out for some industries, the defect of macroscopical industry prediction analysis, generalization can be made up Degree is high.
The invention will be further described with specific embodiment below in conjunction with the accompanying drawings.
Brief description of the drawings
The flow chart for the accurate Forecasting Methodology of the industrial trend based on big data that Fig. 1 provides for the specific embodiment of the invention;
The particular flow sheet positioned to the segmented industry of enterprise that Fig. 2 provides for the specific embodiment of the invention;
The particular flow sheet for obtaining Microscopic Indexes storehouse that Fig. 3 provides for the specific embodiment of the invention;
Fig. 4 predicts that the sample industry is sent out up or down for the utilization Microscopic Indexes storehouse that the specific embodiment of the invention is provided The particular flow sheet of exhibition trend;
The particular flow sheet for the progress linear regression prediction that Fig. 5 provides for the specific embodiment of the invention;
The structural frames for the accurate forecasting system of the industrial trend based on big data that Fig. 6 provides for the specific embodiment of the invention Figure;
The structured flowchart for the trade orientation unit that Fig. 7 provides for the specific embodiment of the invention;
The structured flowchart for the Microscopic Indexes storehouse acquiring unit that Fig. 8 provides for the specific embodiment of the invention;
The structured flowchart for the prediction of the development trend unit that Fig. 9 provides for the specific embodiment of the invention;
The structured flowchart for the regression forecasting module that Figure 10 provides for the specific embodiment of the invention.
Embodiment
In order to more fully understand the technology contents of the present invention, technical scheme is entered with reference to specific embodiment One step introduction and explanation, but it is not limited to this.
Specific embodiment as shown in Fig. 1~10, the industrial trend based on big data that the present embodiment is provided precisely is predicted Method, during being used in the current development of mechanism prediction industry-by-industry, realizes the standard of lifting prediction and analysis Exactness, can carry out independent control for some industries, can make up the defect of macroscopical industry prediction analysis, and generalization degree is high.
As shown in figure 1, present embodiments providing the accurate Forecasting Methodology of industrial trend based on big data, this method includes:
S1, acquisition magnanimity business data;
S2, according to the magnanimity business data, the segmented industry to enterprise is positioned;
S3, according to the segmented industry and magnanimity business data, obtain Microscopic Indexes storehouse;
S4, for each segmented industry set up life cycle index storehouse;
S5, the sample industry for needing to predict, predict that the sample industry is sent out up or down using Microscopic Indexes storehouse Exhibition trend;
S6, the sample industry for needing to predict, life residing for the sample industry is predicted using life cycle index storehouse Cycle.
In other embodiment, above-mentioned S5 steps and S6 steps can exchange order and carry out.
Above-mentioned S1 steps, technology is crawled particular by data, is gathered and is crawled from internet in setting time The business data of each segmented industry in the whole nation, is used as magnanimity business data;After magnanimity business data is got, in addition it is also necessary to these Magnanimity business data is regularly updated, and business data, the support of magnanimity authentic data, lifting prediction and analysis are accumulated with this The degree of accuracy.
Further, above-mentioned S2 steps, according to the magnanimity business data, the segmented industry to enterprise is positioned The step of, including step in detail below:
S21, according to magnanimity business data, set up sector database;
S22, related consultancy website searched for using the full name of enterprise, obtain search returned content;
S23, to it is described search returned content precisely analyzed, obtain key participle;
S24, the crucial participle is matched, classified statistics and labelled, forming the segmented industry of enterprise.
It is specifically to carry out nature semantic analysis to business data to obtain investment type word, with available data for S21 steps Storehouse is compared, and belongs to neologisms and adds industry data.For example:" the three limited public affairs of river shopping club share are obtained from website Department is one of Zhejiang Province chain-supermarket maximum at present, and Chinese chain industry top 100 is that Zhejiang provincial government gives special assistance to greater chain Enterprise, Chinese economic and commercial committee's emphasis contact business.Company possesses two large-scale home-delivery centers at present, and more than 130,000 square metres is taken up an area altogether. There is an employee nearly ten thousand, the people of customer member more than 131 ten thousand has each chain department store shopping of more than nearly 50 ten thousand customer in three rivers to disappear daily Take." by natural semantic analysis, " logistics transportation, chain-supermarket, supermarket distribution " industry participle is obtained, passes through a large amount of information Analysis, it is found that the trend risen is presented in " logistics transportation " and " supermarket distribution ", after the two words and database are compared, and sets up new Industry field " communications and transportation, storage and postal industry " arrive " logistics " and arrive again " supermarket distribution ".
Above-mentioned S22 steps, related consultancy website is searched for using the full name of enterprise, obtains search returned content;Above-mentioned phase Closing consultancy website includes Baidu, Baidu's news, online Yellow Pages, enterprise official website, microblogging, wechat, recruitment, industrial and commercial information, patent letter Breath and SEO etc., are collected and handled based on internet public information, in the absence of sensitive information, data acquisition cost is relatively low.When new After enterprise adds, the mass data of related consultancy website is gathered in internet using the full name of enterprise, big data HDFS technologies are used Distributed storage mass data, using the mass data as search returned content, based on ripe big data technology, it is ensured that sea The safety storage of data is measured, efficiency high, the degree of accuracy is constantly lifted with the accumulation of data.
For above-mentioned S23 steps, the semantic participle of industry is carried out to above-mentioned search returned content, draw as online education, Mobile Internet, new four plate, VC, angel, PE, trade market, new three plate, merger, merger & reorganization, GEM, middle platelet, master Plate, overseas, the investment bank, the participle such as live and O2O, specifically, being that the mass data that collection is returned is cleaned, returned Class, extraction summary, extraction keyword, participle method and semantic analysis, obtain accurately linguistic data;Will described in accurately language Material is matched with corpus, obtains the crucial participle of corresponding investment attribute.
For above-mentioned S24 steps, be specifically the crucial participle is matched with the industry progress in sector database and Classified statistics, filter out ranking and the rational industry attribute of weight, accurately industry and Product labelling are stamped to enterprise.Using making Matching and the classified statistics of mass data are carried out with big data MapReduce.Guarantee mass data distributed treatment, efficiency high, The degree of accuracy is constantly lifted with the accumulation of data.With big data technology humanized, based on Distributed Parallel Computing framework, magnanimity is solved The storage and calculating of data;Above-mentioned participle is counted, such as, if wherein online education is occurred in that seven times, and with number According to the online education tag match in storehouse, ranking and the rational industry attribute of weight are selected using by statistical algorithms.Such as, Be that the enterprise has stamped industry label by algorithm optimization, such as internet/Internet Information Service/online education, so that really Determine the segmented industry of enterprise, the segmented industry of enterprise can be refine to during so as to industry trend analysis below.
Further, for S3 steps, according to the segmented industry and magnanimity business data, Microscopic Indexes storehouse is obtained The step of, including step in detail below:
S31, according to the segmented industry, the magnanimity business data is sorted out;
S32, each dimension according to the segmented industry, are analyzed the magnanimity business data after classification;
The business data of each dimension of each segmented industry of S33, classified statistics;
S34, each dimension of each segmented industry of quantization business data, obtain each dimension of each segmented industry Leading indicators;
All leading indicators of S35, integration, composition Microscopic Indexes storehouse.
, can be with pin specifically in order to which magnanimity business data is finely divided after the classification of industry for above-mentioned S31 steps The different segmented industries are individually analyzed, the enterprise data analysis of the segmented industry are refine to, it can be deduced that more careful The current trend of the segmented industry.
For above-mentioned S32 steps, above-mentioned dimension include patent concentration degree, the capital of property, the concentration degree of financial capital, Time etc., but be not limited to above-mentioned mentioned dimension, during to the analysis of the current trend of each segmented industry, it is necessary to according to Dimension is analyzed, and can just accomplish comprehensive analysis to the whole segmented industry, improves the degree of accuracy of analysis and prediction.
Above-mentioned S33 steps and S34 steps, is counted to the business data of each dimension in each segmented industry And quantization, it is used as leading indicators using the data after counting and quantifying.
Above-mentioned S35 steps, all leading indicators are stored in same database, and the database is then Microscopic Indexes Storehouse.
Specifically, for above-mentioned S31 steps to S35 steps, obtain each segmented industry patent concentration degree this During the leading indicators of dimension, each segmented industry patent of invention, utility model, total of three types of outward appearance in patent are counted Quantity, will amount to leading indicators of the quantity as patent concentration degree, is saved in database, Microscopic Indexes storehouse is formed, for other The leading indicators of dimension, also in the way of the leading indicators of above-mentioned patent concentration degree are obtained by that analogy, obtain other aspects Leading indicators, and be stored in Microscopic Indexes storehouse.
For above-mentioned S4 steps, life cycle index storehouse is set up for each segmented industry, life cycle here refers to The life cycle index marked in storehouse includes starting phase, period of expansion, prosperous phase, declining period;The life cycle of such as one VR industry Index storehouse includes starting phase, period of expansion, prosperous phase, the data such as declining period.
Further, above-mentioned S5 steps, for the sample industry for needing to predict, are predicted described using Microscopic Indexes storehouse Sample industry up or down development trend the step of, specifically using SVM (SVMs) to for need predict sample The industry carries out sequential regression forecasting industry development trend up or down.
It includes step in detail below:
S51, the sample industry for obtaining needs prediction, and extract time series data out of sample industry data;
S52, the time series data is inquired about in the Microscopic Indexes storehouse, and to the time series data vectorization;
S53, the frequency according to time series data described in TID classified statistics;
S54, to TID hierarchical indexs;
S55, the statistics time series data average, standard deviation and quantile;
S56, the frequency to the time series data, average, standard deviation and quantile carry out linear regression prediction.
For above-mentioned S51 steps, useful time series data is extracted out of sample industry data, helps to reduce in advance The time of survey, improve forecasting efficiency.
For above-mentioned S52 steps, time series data is carried out vectorization be in order to the leading indicators in Microscopic Indexes storehouse Contrasted, be easy to follow-up statistics and analysis, it is as shown in the table, it is the data after vector:
TID TDATE Index 1 Index 2 Index 3
1 2009/1/1 0.003 44 1.3
2 2009/2/1 0.005 22 3.6
3 2009/3/1 -0.004 12 7.1
Above-mentioned S53 steps, after each time series data vectorization are grouped according to TID, accordingly, it would be desirable to be united according to TID Count frequency.
In addition, for above-mentioned S54 steps, hierarchical index is to turn column processing in order to which TID is entered into every trade.
Above-mentioned S55 steps, during being predicted to sample industry, except statistics frequency, in addition it is also necessary to average statistical, Standard deviation and quantile, analyze multiple parameters, are favorably improved the degree of accuracy entirely predicted.Parameter after statistics needs description Out, the statistics as shown in the table for some time series data describes form:
Further, for above-mentioned S56 steps, to the frequency of the time series data, average, standard deviation and point position The step of number carries out linear regression operation, including step in detail below:
S561, the frequency to the time series data, average, standard deviation and quantile carry out subdivided modeling, obtain and return Model;
S562, error rate, R side and the regression coefficient for assessing each model, draw the figure of each model;
S563, the prediction that development trend up or down is carried out to each model, and preservation predict the outcome.
For above-mentioned S561 steps, subdivided modeling is conducive to being grouped test data, improves forecasting efficiency, its modeling method It is as follows:
SampleR=0.6;
Y=filldata.ix [:,0];
X=filldata.ix [:,1:2];
Nsample=len (y);
SampleBoundary=int (nsample*sampleR);
ShffleIdx=range (nsample);
np.random.shuffle(shffleIdx);
Train_y=y [shffleIdx [:sampleBoundary]];
Train_x=x.ix [shffleIdx [:sampleBoundary]];
Test_x=x.ix [shffleIdx [sampleBoundary:]];
Test_y=y [shffleIdx [sampleBoundary:]];
# linear regression model (LRM)s;
LR=sklearn.linear_model.LinearRegression ();
LR.fit(train_x,train_y);
Predict_y=LR.predict (test_x).
For above-mentioned S562 steps, the trend of industry is observed using the figure of each model, more intuitively.Following institute Show:
Ysample=range (len (test_y));
Error=np.linalg.norm (predict_y-test_y, ord=1)/len (test_y).
For above-mentioned S563 steps, using model prediction, the defect of macroscopical industry prediction analysis can be made up, and be based on The prediction carried out under the accurate division of the segmented industry, can improve the degree of accuracy of prediction.It is as follows:
LR=sklearn.linear_model.LinearRegression ();
LR.fit(x,y);
Pre_y=LR.predict (x);
# is preserved and predicted the outcome;
Res=pd.DataFrame (pre_y);
Res.to_csv (' result.csv', header=None, index=False).
Above-mentioned S6 steps, are mainly used for analyzing the life cycle residing for industry, with reference to life cycle and industry to Upper or downward development trend, be conducive to comprehensively, accurately predict industrial trend.
The above-mentioned accurate Forecasting Methodology of the industrial trend based on big data, by obtaining magnanimity business data, with reference to industry The segmented industry precise positioning, magnanimity business data is sorted out, counted and measured for the different dimensions of the segmented industry Change, form Microscopic Indexes storehouse, Microscopic Indexes storehouse is used to predict the development trend of industry up or down, sets up the life of the segmented industry Cyclical indicator storehouse is ordered, the life cycle of industry is predicted, the degree of accuracy of lifting prediction and analysis is realized, can be carried out for some industries Independent control, can make up the defect of macroscopical industry prediction analysis, and generalization degree is high.
As shown in fig. 6, the present embodiment additionally provides the accurate forecasting system of industrial trend based on big data, it includes data Acquiring unit 1, trade orientation unit 2, Microscopic Indexes storehouse acquiring unit 3, life cycle index storehouse set up unit 4, development trend Predicting unit 5 and life cycle predicting unit 6.
Data capture unit 1, for obtaining magnanimity business data.
Trade orientation unit 2, for according to the magnanimity business data, the segmented industry to enterprise to be positioned.
Microscopic Indexes storehouse acquiring unit 3, for according to the segmented industry and magnanimity business data, obtaining Microscopic Indexes Storehouse.
Life cycle index storehouse sets up unit 4, for setting up life cycle index storehouse for each segmented industry.
Prediction of the development trend unit 5, for the sample industry for needing to predict, the sample is predicted using Microscopic Indexes storehouse Industry development trend up or down.
Life cycle predicting unit 6, for the sample industry for needing to predict, institute is predicted using life cycle index storehouse State life cycle residing for sample industry.
Above-mentioned data capture unit 1 crawls technology particular by data, is gathered in setting time from internet Business data with national each segmented industry is crawled, is used as magnanimity business data;After magnanimity business data is got, in addition it is also necessary to These magnanimity business data are regularly updated, business data, the support of magnanimity authentic data, lifting prediction are accumulated with this With the degree of accuracy of analysis.
Further, above-mentioned trade orientation unit 2 includes Database module 21, content obtaining module 22, divided Word acquisition module 23 and processing module 24.
Database module 21, for according to magnanimity business data, setting up sector database.
Content obtaining module 22, for using the related consultancy website of the full name search of enterprise, obtaining search returned content.
Participle acquisition module 23, for precisely being analyzed the search returned content, obtains crucial participle.
Processing module 24, for being matched to the crucial participle, classified statistics and labels, and forms the thin of enterprise Branch trade.
Database module 21 is specifically to carry out nature semantic analysis to business data to obtain investment type word, and existing Database is compared, and belongs to neologisms and adds industry data.For example:Obtain that " three river shopping club shares have from website Limit company is one of Zhejiang Province chain-supermarket maximum at present, Chinese chain industry top 100, and it is large-scale to be that Zhejiang provincial government gives special assistance to Chain, Chinese economic and commercial committee's emphasis contact business.Company possesses two large-scale home-delivery centers at present, and more than 130,000 squares is taken up an area altogether Rice.There is an employee nearly ten thousand, the people of customer member more than 131 ten thousand has each chain department store of more than nearly 50 ten thousand customer in three rivers to do shopping daily Consumption." by natural semantic analysis, " logistics transportation, chain-supermarket, supermarket distribution " industry participle is obtained, passes through a large amount of information Analysis, it is found that the trend risen is presented in " logistics transportation " and " supermarket distribution ", after the two words and database are compared, foundation New industry field " communications and transportation, storage and postal industry " arrives " logistics " and arrived again " supermarket distribution ".
Content obtaining module 22 is specifically, using the related consultancy website of the full name search of enterprise, to obtain search returned content;On The related consultancy website stated include Baidu, Baidu's news, online Yellow Pages, enterprise official website, microblogging, wechat, recruitment, industrial and commercial information, Patent information and SEO etc., are collected and handled based on internet public information, in the absence of sensitive information, data acquisition cost is relatively low. After being added when new spectra, the mass data of related consultancy website is gathered in internet using the full name of enterprise, big data is used HDFS technology distributions formula stores mass data, using the mass data as search returned content, based on ripe big data skill Art, it is ensured that the safety storage of mass data, efficiency high, the degree of accuracy is constantly lifted with the accumulation of data.
Participle acquisition module 23 is specifically that the semantic participle of industry is carried out to above-mentioned search returned content, is drawn such as online religion Educate, mobile Internet, new four plate, VC, angel, PE, trade market, new three plate, merger, merger & reorganization, GEM, middle platelet, Mainboard, overseas, the investment bank, the participle such as live and O2O, specifically, being that the mass data that collection is returned is cleaned, returned Class, extraction summary, extraction keyword, participle method and semantic analysis, obtain accurately linguistic data;Will described in accurately language Material is matched with corpus, obtains the crucial participle of corresponding investment attribute.
Processing module 24 is specifically to be matched the crucial participle and classified statistics with the industry in sector database, Ranking and the rational industry attribute of weight are filtered out, accurately industry and Product labelling are stamped to enterprise.Using using big data MapReduce carries out matching and the classified statistics of mass data.Ensure mass data distributed treatment, efficiency high, the degree of accuracy with The accumulation for data is constantly lifted.With big data technology humanized, based on Distributed Parallel Computing framework, depositing for mass data is solved Storage and calculating;Above-mentioned participle is counted, such as, if wherein online education is occurred in that seven times, and with database Online education tag match, ranking and the rational industry attribute of weight are selected using by statistical algorithms.Such as, algorithm is passed through It is optimized for the enterprise and has stamped industry label, such as internet/Internet Information Service/online education, so that it is determined that enterprise The segmented industry, the segmented industry of enterprise can be refine to during so as to industry trend analysis below.
Further, above-mentioned Microscopic Indexes storehouse acquiring unit 3 includes classifying module 31, analysis module 32, packet system Count module 33, quantization modules 34 and integrate module 35.
Classifying module 31, for according to the segmented industry, sorting out to the magnanimity business data.
Analysis module 32, for each dimension according to the segmented industry, is carried out to the magnanimity business data after classification Analysis.
Classified statistics module 33, the business data for each dimension of each segmented industry of classified statistics.
Quantization modules 34, the business data of each dimension for quantifying each segmented industry obtains each segmented industry Each dimension leading indicators.
Integrate module 35, all leading indicators for integrating, composition Microscopic Indexes storehouse.
Above-mentioned classifying module 31 is specifically that, in order to which magnanimity business data is finely divided after the classification of industry, can be directed to The different segmented industries are individually analyzed, and refine to the enterprise data analysis of the segmented industry, it can be deduced that more careful is thin Industry-specific current trend.
Dimension mentioned by above-mentioned analysis module 32 include patent concentration degree, the capital of property, the concentration degree of financial capital, Time etc., but be not limited to above-mentioned mentioned dimension, during to the analysis of the current trend of each segmented industry, it is necessary to according to Dimension is analyzed, and can just accomplish comprehensive analysis to the whole segmented industry, improves the degree of accuracy of analysis and prediction.
The business data of classified statistics module 33 and each dimension in each segmented industry of 34 pairs of quantization modules is carried out Statistics and quantization, leading indicators are used as using the data after counting and quantifying.
Integrate module 35 all leading indicators are stored in same database, the database is then Microscopic Indexes storehouse.
When obtaining the leading indicators of the patent concentration degree of each segmented industry this dimension, count each segmented industry and exist Patent of invention, utility model, the total quantity of three types of outward appearance in patent, will amount to quantity and are used as the leading of patent concentration degree Index, is saved in database, Microscopic Indexes storehouse is formed, for the leading indicators of other dimensions, also according to above-mentioned patent concentration degree The mode that obtains of leading indicators by that analogy, obtain otherwise leading indicators, and be stored in Microscopic Indexes storehouse.
It is to set up life cycle index storehouse for each segmented industry that above-mentioned life cycle index storehouse, which sets up unit 4, this In life cycle index storehouse in life cycle index include starting phase, period of expansion, prosperous phase, declining period;Such as one VR The life cycle index storehouse of industry includes starting phase, period of expansion, prosperous phase, the data such as declining period.
Further, above-mentioned prediction of the development trend unit 5 is specifically to for needing using SVM (SVMs) The sample industry of prediction carries out sequential regression forecasting industry development trend up or down.
In addition, above-mentioned prediction of the development trend unit 5 includes time series data preparation module 51, data processing module 52, frequency Number statistical module 53, hierarchical index module 54, parametric statistics module 55 and regression forecasting module 56.
Time series data preparation module 51, for obtaining the sample industry for needing to predict, and takes out out of sample industry data Take out time series data.
Data processing module 52, for inquiring about the time series data in the Microscopic Indexes storehouse, and to it is described when ordinal number According to vectorization.
Frequency statistics module 53, for the frequency according to time series data described in TID classified statistics.
Layer index module, for TID hierarchical indexs.
Parametric statistics module 55, average, standard deviation and quantile for counting the time series data.
Regression forecasting module 56, it is linear for being carried out to the frequency of the time series data, average, standard deviation and quantile Regression forecasting.
Above-mentioned time series data preparation module 51 extracts useful time series data out of sample industry data, contributes to The time of prediction is reduced, forecasting efficiency is improved.
Above-mentioned data processing module 52 to time series data carry out vectorization be in order to the leading finger in Microscopic Indexes storehouse Mark is contrasted, and is easy to follow-up statistics and analysis, as shown in the table, is the data after vector:
TID TDATE Index 1 Index 2 Index 3
1 2009/1/1 0.003 44 1.3
2 2009/2/1 0.005 22 3.6
3 2009/3/1 -0.004 12 7.1
After each time series data vectorization it is grouped according to TID, accordingly, it would be desirable to which Frequency statistics module 53 is united according to TID Count frequency.
Above-mentioned hierarchical index module 54 is to turn column processing in order to which TID is entered into every trade.
During above-mentioned parametric statistics module 55 is predicted to sample industry, except statistics frequency, in addition it is also necessary to count Average, standard deviation and quantile, analyze multiple parameters, are favorably improved the degree of accuracy entirely predicted.Parameter after statistics is needed It is depicted and, the statistics as shown in the table for some time series data describes form:
rev 1 2 3
count 973.000000 973.000000 973.000000
mean 0.000843 0.000754 0.000329
std 0.021283 0.021971 0.014343
min -0.089100 -0.099450 -0.067440
25% -0.010480 -0.011000 -0.007110
50% 0.000843 0.000000 0.000730
75% 0.009670 0.011000 0.008500
max 0.100080 0.100000 0.061150
Further, above-mentioned regression forecasting module 56 includes modeling submodule 561, assesses submodule 562 and protect Deposit prediction submodule 563.
Submodule 561 is modeled, for being grouped to the frequency of the time series data, average, standard deviation and quantile Modeling, obtains regression model.
Submodule 562 is assessed, error rate, R side and regression coefficient for assessing each model draw each model Figure.
Prediction submodule 563 is preserved, the prediction for carrying out development trend up or down to each model, and preserve pre- Survey result.
Above-mentioned modeling submodule 561 carries out subdivided modeling and is conducive to being grouped test data, improves forecasting efficiency, it is modeled Method is as follows:
SampleR=0.6;
Y=filldata.ix [:,0];
X=filldata.ix [:,1:2];
Nsample=len (y);
SampleBoundary=int (nsample*sampleR);
ShffleIdx=range (nsample);
np.random.shuffle(shffleIdx);
Train_y=y [shffleIdx [:sampleBoundary]];
Train_x=x.ix [shffleIdx [:sampleBoundary]];
Test_x=x.ix [shffleIdx [sampleBoundary:]];
Test_y=y [shffleIdx [sampleBoundary:]];
# linear regression model (LRM)s;
LR=sklearn.linear_model.LinearRegression ();
LR.fit(train_x,train_y);
Predict_y=LR.predict (test_x).
Above-mentioned assesses the trend that submodule 562 observes industry using the figure of each model, more intuitively.Following institute Show:
Ysample=range (len (test_y));
Error=np.linalg.norm (predict_y-test_y, ord=1)/len (test_y).
Above-mentioned preservation prediction submodule 563 utilizes model prediction, can make up the defect of macroscopical industry prediction analysis, and The prediction carried out under accurate division based on the segmented industry, can improve the degree of accuracy of prediction.It is as follows:
LR=sklearn.linear_model.LinearRegression ();
LR.fit(x,y);
Pre_y=LR.predict (x);
# is preserved and predicted the outcome;
Res=pd.DataFrame (pre_y);
Res.to_csv (' result.csv', header=None, index=False).
In addition, above-mentioned life cycle predicting unit 6 is mainly used for analyzing the life cycle residing for industry, with reference to life The development trend of cycle and industry up or down, be conducive to comprehensively, accurately predict industrial trend.
The above-mentioned accurate forecasting system of the industrial trend based on big data, by obtaining magnanimity business data, with reference to industry The segmented industry precise positioning, magnanimity business data is sorted out, counted and measured for the different dimensions of the segmented industry Change, form Microscopic Indexes storehouse, Microscopic Indexes storehouse is used to predict the development trend of industry up or down, sets up the life of the segmented industry Cyclical indicator storehouse is ordered, the life cycle of industry is predicted, the degree of accuracy of lifting prediction and analysis is realized, can be carried out for some industries Independent control, can make up the defect of macroscopical industry prediction analysis, and generalization degree is high.
The above-mentioned technology contents that the present invention is only further illustrated with embodiment, in order to which reader is easier to understand, but not Represent embodiments of the present invention and be only limitted to this, any technology done according to the present invention extends or recreated, by the present invention's Protection.Protection scope of the present invention is defined by claims.

Claims (10)

1. the accurate Forecasting Methodology of industrial trend based on big data, it is characterised in that methods described includes:
Obtain magnanimity business data;
According to the magnanimity business data, the segmented industry to enterprise is positioned;
According to the segmented industry and magnanimity business data, Microscopic Indexes storehouse is obtained;
Life cycle index storehouse is set up for each segmented industry;
For the sample industry for needing to predict, sample industry development trend up or down is predicted using Microscopic Indexes storehouse;
For the sample industry for needing to predict, life cycle residing for the sample industry is predicted using life cycle index storehouse.
2. the accurate Forecasting Methodology of the industrial trend according to claim 1 based on big data, it is characterised in that according to described Magnanimity business data, the step of being positioned to the segmented industry of enterprise, including step in detail below:
According to magnanimity business data, sector database is set up;
Using the related consultancy website of the full name search of enterprise, search returned content is obtained;
The search returned content is precisely analyzed, crucial participle is obtained;
The crucial participle is matched, classified statistics and labelled, the segmented industry of enterprise is formed.
3. the accurate Forecasting Methodology of the industrial trend according to claim 1 based on big data, it is characterised in that according to described The segmented industry and magnanimity business data, the step of obtaining Microscopic Indexes storehouse, including step in detail below:
According to the segmented industry, the magnanimity business data is sorted out;
According to each dimension of the segmented industry, the magnanimity business data after classification is analyzed;
The business data of each dimension of each segmented industry of classified statistics;
Quantify the business data of each dimension of each segmented industry, obtain the leading finger of each dimension of each segmented industry Mark;
Integrate all leading indicators, composition Microscopic Indexes storehouse.
4. the accurate Forecasting Methodology of the industrial trend based on big data according to any one of claims 1 to 3, its feature exists In for the sample industry for needing to predict, sample industry development trend up or down is predicted using Microscopic Indexes storehouse Step, including step in detail below:
The sample industry for needing to predict is obtained, and time series data is extracted out of sample industry data;
The time series data is inquired about in the Microscopic Indexes storehouse, and to the time series data vectorization;
According to the frequency of time series data described in TID classified statistics;
To TID hierarchical indexs;
Count the average, standard deviation and quantile of the time series data;
Linear regression prediction is carried out to the frequency of the time series data, average, standard deviation and quantile.
5. the accurate Forecasting Methodology of the industrial trend according to claim 4 based on big data, it is characterised in that to it is described when The step of frequency, average, standard deviation and the quantile of ordinal number evidence carry out linear regression operation, including step in detail below:
Subdivided modeling is carried out to the frequency of the time series data, average, standard deviation and quantile, regression model is obtained;
Error rate, R side and the regression coefficient of each model are assessed, the figure of each model is drawn;
The prediction of development trend up or down is carried out to each model, and preservation predicts the outcome.
6. the accurate forecasting system of industrial trend based on big data, it is characterised in that including data capture unit, trade orientation list Member, Microscopic Indexes storehouse acquiring unit, life cycle index storehouse set up unit, prediction of the development trend unit and life cycle prediction Unit;
The data capture unit, for obtaining magnanimity business data;
The trade orientation unit, for according to the magnanimity business data, the segmented industry to enterprise to be positioned;
Microscopic Indexes storehouse acquiring unit, for according to the segmented industry and magnanimity business data, obtaining Microscopic Indexes Storehouse;
The life cycle index storehouse sets up unit, for setting up life cycle index storehouse for each segmented industry;
The prediction of the development trend unit, for the sample industry for needing to predict, the sample is predicted using Microscopic Indexes storehouse Industry development trend up or down;
The life cycle predicting unit, for the sample industry for needing to predict, institute is predicted using life cycle index storehouse State life cycle residing for sample industry.
7. the accurate forecasting system of the industrial trend according to claim 6 based on big data, it is characterised in that the industry Positioning unit includes Database module, content obtaining module, participle acquisition module and processing module;
The Database module, for according to magnanimity business data, setting up sector database;
The content obtaining module, for using the related consultancy website of the full name search of enterprise, obtaining search returned content;
The participle acquisition module, for precisely being analyzed the search returned content, obtains crucial participle;
The processing module, for being matched to the crucial participle, classified statistics and labels, and forms the subdivision of enterprise Industry.
8. the accurate forecasting system of the industrial trend according to claim 7 based on big data, it is characterised in that described microcosmic Index storehouse acquiring unit includes classifying module, analysis module, classified statistics module, quantization modules and integrates module;
The classifying module, for according to the segmented industry, sorting out to the magnanimity business data;
The analysis module, for each dimension according to the segmented industry, is divided the magnanimity business data after classification Analysis;
The classified statistics module, the business data for each dimension of each segmented industry of classified statistics;
The quantization modules, the business data of each dimension for quantifying each segmented industry obtains each segmented industry The leading indicators of each dimension;
The integration module, all leading indicators for integrating, composition Microscopic Indexes storehouse.
9. the accurate forecasting system of the industrial trend according to claim 8 based on big data, it is characterised in that the development Trend prediction unit includes time series data preparation module, data processing module, Frequency statistics module, hierarchical index module, parameter Statistical module and regression forecasting module;
The time series data preparation module, is extracted for obtaining the sample industry for needing to predict, and out of sample industry data Go out time series data;
The data processing module, for inquiring about the time series data in the Microscopic Indexes storehouse, and to the time series data Vectorization;
The Frequency statistics module, for the frequency according to time series data described in TID classified statistics;
The hierarchical index module, for TID hierarchical indexs;
The parametric statistics module, average, standard deviation and quantile for counting the time series data;
The regression forecasting module, for linearly being returned to the frequency of the time series data, average, standard deviation and quantile Return prediction.
10. the accurate forecasting system of the industrial trend according to claim 9 based on big data, it is characterised in that described time Returning prediction module includes modeling submodule, assesses submodule and preserves prediction submodule;
The modeling submodule, builds for carrying out packet to the frequency of the time series data, average, standard deviation and quantile Mould, obtains regression model;
The assessment submodule, error rate, R side and regression coefficient for assessing each model draw the figure of each model Shape;
It is described to preserve prediction submodule, for carrying out the prediction of development trend up or down to each model, and preserve prediction As a result.
CN201710423129.7A 2017-06-07 2017-06-07 The accurate Forecasting Methodology of industrial trend and its system based on big data Pending CN107301471A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710423129.7A CN107301471A (en) 2017-06-07 2017-06-07 The accurate Forecasting Methodology of industrial trend and its system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710423129.7A CN107301471A (en) 2017-06-07 2017-06-07 The accurate Forecasting Methodology of industrial trend and its system based on big data

Publications (1)

Publication Number Publication Date
CN107301471A true CN107301471A (en) 2017-10-27

Family

ID=60134633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710423129.7A Pending CN107301471A (en) 2017-06-07 2017-06-07 The accurate Forecasting Methodology of industrial trend and its system based on big data

Country Status (1)

Country Link
CN (1) CN107301471A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647850A (en) * 2018-04-03 2018-10-12 杭州布谷科技有限责任公司 It is a kind of based on artificial intelligence colleges and universities aspiration make a report on decision-making technique and system
CN109993644A (en) * 2017-12-29 2019-07-09 航天信息股份有限公司 A kind of portrait determines method, apparatus, electronic equipment and storage medium
CN110084411A (en) * 2019-04-11 2019-08-02 企家有道网络技术(北京)有限公司 Learning model and implementation method for predicting to return
CN110110898A (en) * 2019-04-11 2019-08-09 企家有道网络技术(北京)有限公司 Based on the industry analysis method and device of enterprise's health indicator, server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894316A (en) * 2010-06-10 2010-11-24 焦点科技股份有限公司 Method and system for monitoring indexes of international market prosperity conditions
CN102236827A (en) * 2010-04-26 2011-11-09 郑莉莉 Tax management analysis system
CN104123600A (en) * 2014-08-14 2014-10-29 国家电网公司 Electrical manager's index forecasting method for typical industry big data
US20140336791A1 (en) * 2013-05-09 2014-11-13 Rockwell Automation Technologies, Inc. Predictive maintenance for industrial products using big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236827A (en) * 2010-04-26 2011-11-09 郑莉莉 Tax management analysis system
CN101894316A (en) * 2010-06-10 2010-11-24 焦点科技股份有限公司 Method and system for monitoring indexes of international market prosperity conditions
US20140336791A1 (en) * 2013-05-09 2014-11-13 Rockwell Automation Technologies, Inc. Predictive maintenance for industrial products using big data
CN104123600A (en) * 2014-08-14 2014-10-29 国家电网公司 Electrical manager's index forecasting method for typical industry big data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993644A (en) * 2017-12-29 2019-07-09 航天信息股份有限公司 A kind of portrait determines method, apparatus, electronic equipment and storage medium
CN108647850A (en) * 2018-04-03 2018-10-12 杭州布谷科技有限责任公司 It is a kind of based on artificial intelligence colleges and universities aspiration make a report on decision-making technique and system
CN110084411A (en) * 2019-04-11 2019-08-02 企家有道网络技术(北京)有限公司 Learning model and implementation method for predicting to return
CN110110898A (en) * 2019-04-11 2019-08-09 企家有道网络技术(北京)有限公司 Based on the industry analysis method and device of enterprise's health indicator, server

Similar Documents

Publication Publication Date Title
CN107301471A (en) The accurate Forecasting Methodology of industrial trend and its system based on big data
Zhao et al. Distributed feature selection for efficient economic big data analysis
Altuntas et al. Analysis of patent documents with weighted association rules
Tyagi et al. A hybrid approach using AHP-TOPSIS for analyzing e-SCM performance
CN102708149A (en) Data quality management method and system
CN104364781B (en) System and method for calculating classification ratio
CN106067094A (en) A kind of dynamic assessment method and system
CN106447206A (en) Power utilization analysis method based on acquisition data of power utilization information
CN106845846A (en) Big data asset evaluation method
Bourqui et al. Detecting structural changes and command hierarchies in dynamic social networks
CN102880915A (en) Method of forecasting electric quantity based on association mining of hot events
CN107918639A (en) Based on electric power big data main transformer peak load forecasting method and data warehouse
CN105654196A (en) Adaptive load prediction selection method based on electric power big data
Li Collaborative filtering recommendation algorithm based on cluster
CN102663065A (en) Method for identifying and screening abnormal data of advertising positions
CN102147816A (en) System for counting cases and analyzing tendency
CN106127602A (en) A kind of stealing discrimination method based on yojan outlier algorithm and device
Escobedo et al. Business intelligence and data analytics (BI&DA) to support the operation of smart grid
CN110490486A (en) A kind of enterprise's big data management system
CN107704723A (en) A kind of notable Variable Selection based on Slope correlation
Matsunaga et al. Data mining applications and techniques: A systematic review
Huang et al. Clustering analysis on e-commerce transaction based on k-means clustering
Yan et al. Research on the application of data mining technology in insurance informatization
Wu et al. Data mining pattern valuation in apparel industry E-commerce cloud
Sawalha et al. Towards an Efficient Big Data Management Schema for IoT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination