CN110310012A - Data analysing method, device, equipment and computer readable storage medium - Google Patents
Data analysing method, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN110310012A CN110310012A CN201910479378.7A CN201910479378A CN110310012A CN 110310012 A CN110310012 A CN 110310012A CN 201910479378 A CN201910479378 A CN 201910479378A CN 110310012 A CN110310012 A CN 110310012A
- Authority
- CN
- China
- Prior art keywords
- data
- fund raising
- grade
- model
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
- G06Q10/06375—Prediction of business process outcome or impact based on a proposed change
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/90—Financial instruments for climate change mitigation, e.g. environmental taxes, subsidies or financing
Abstract
The present invention relates to field of artificial intelligence, disclose a kind of data analysing method, it include: that inside representation data and the outside representation data of object are obtained according to data analysis request to carry out grade classification, and calculate the maximum fund raising range and assets ability to bear grade of object, then the training that fund raising prediction model is carried out according to maximum fund raising range and the corresponding model training algorithm of assets ability to bear hierarchical selection, business data and model based on object to be predicted export fund raising prediction result.The invention also discloses a kind of data analysis set-up, equipment and computer readable storage mediums, the present invention is based on the funding mechanisms to carry out preparing for fund, it raises funds to raise funds and the unmatched situation of income caused by planning so as to avoid the blindness of enterprise, planning is raised to be formed based on enterprises and external data simultaneously, precision of the system greatly improved in planning application, it ensure that the maximum benefit that enterprise raises funds, also improve the enthusiasm that enterprise raises funds to poverty alleviation.
Description
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of data analysing method, device, equipment and computers
Readable storage medium storing program for executing.
Background technique
With the continuous development of current artificial intelligence, especially in the data statistics and plan of operation of enterprise, manually
Intelligence can save many human resources to enterprise, and in current technology, for the system of the planning application of enterprise, by
It is to be arranged in the inside of enterprise, and require the confidentiality of data in system, therefore the connection that generally can not carry out outer net is
It unites when carrying out the analysis of data, the data usually using analysis are the historical datas inside enterprise's current year to be planned,
And still avail data, because external information needs are constantly imported from external network, so that the update of data is simultaneously
Not in time, so as to cause the difference and inaccuracy of analysis;Simultaneity factor is also advised greatly there is no excessive progress in analysis
The assets of mould and the analysis of maximum capacity, itself planning that will lead to enterprise in this way can be easy to move towards to polarise, otherwise
It exactly satiates, otherwise is exactly superfluous, and the existence of enterprise can be leveraged by satiating, and the superfluous development that will limit enterprise.
Especially when enterprise is raised funds, if data update not in time, will lead to the analysis of system
Not comprehensively, analysis is easy to appear since enterprise is from the limitation of development and capital, leading to analysis is more than enterprise's ability to bear, thus
Cause the not accurate of planning.It can be seen that the system and method for not forming a kind of fund raising institutional analysis of high-level at present, so that
The inaccuracy and low efficiency of data analysis, so that raising funds rationally meet the corresponding feedback machine of different scales enterprise fund raising
System, cause the operation of enterprise bad, bring biggish funding risk to enterprise, reduce enterprise raise funds and income is matched can
It can property.
Summary of the invention
The main purpose of the present invention is to provide a kind of data analysing method, device, equipment and computer-readable storage mediums
Matter, it is intended to solve to cause system to ask the technology of enterprise's fund raising planning application inaccuracy not in time due to the update of existing data
Topic.
To achieve the above object, the present invention provides a kind of data analysing method, and the data analysing method includes:
Receive the data analysis request that terminal is sent, and object to be analyzed in analysis request based on the data, acquisition pair
The object data set answered, the object data set include at least representation data and external representation data inside object;
By representation data inside object and external representation data according to preset fund raising grade classification grade, at least one is obtained
Data subset, the data subset and the object to be analyzed correspond;
According to the data subset, the corresponding maximum fund raising range of object to be analyzed and the maximum of its assets are calculated
Ability to bear grade;
According to the maximum fund raising range and the maximum bearing ability grade of its assets, corresponding model training is selected to calculate
Method;
The training for carrying out fund raising prediction to the data subset according to the model training algorithm obtains prediction mould of raising funds
Type, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, export fund raising prediction result.
Optionally, the object to be analyzed in the analysis request based on the data, obtains corresponding object data set
After step, further includes:
The data format of used data set, the data format packet when obtaining for training the fund raising prediction model
Include the storage position of label column, the collating sequence of label column and data;
According to the data format to label column in the internal representation data and external representation data according to the sequence
Sequence is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;
If there is the label column of missing in the external representation data and internal representation data, in the internal number of drawing a portrait
According to increase the label column of missing on corresponding position in external representation data, and the data that fill in the blanks are standardized to be formed
Object data set;
If there are the label columns of redundancy in the external representation data and internal representation data, by the internal number of drawing a portrait
According to redundancy in external representation data label column and its corresponding data delete or shield from data set and be set as invalid,
To form standardized object data set.
Optionally, described by representation data inside object and external representation data according to preset fund raising grade classification etc.
Grade, after obtaining at least one data subset, further includes:
It is given a mark, is beaten at least one described data subset by the weight ratio coefficient in preset scoring model
Divide result;
According to the marking as a result, being ranked up to the data subset according to sequence from big to small, and select to give a mark
Valid data collection of the forward N number of data subset as fund raising prediction model training, wherein N >=1.
Optionally, described by representation data inside object and external representation data according to preset fund raising grade classification etc.
Grade, after obtaining at least one data subset, further includes:
The data subset carries out signature analysis, obtains the identical data characteristics of each data in the data subset;
Feature derivation is carried out to the data characteristics, obtains data similar with the data in the data subset, wherein
The feature derivation is that further subdivision either extension similar features are done to the data characteristics.
Optionally, the training for carrying out fund raising prediction to the data subset according to the model training algorithm, obtains
Fund raising prediction model, and the ecological engineering to object is predicted based on the fund raising prediction model, output raises funds to predict knot
Fruit includes:
When being trained using the training algorithm of Light GBM model, according to the grade classification result of the data subset
Light GBM model training framework corresponding with the grade of the data subset is matched, and the data subset is input to institute
It states in model architecture and is trained, obtain the fund raising prediction model, wherein the fund raising prediction model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yiAfter carrying out standardization processing for representation data
Label column label value,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data,
ftBeing characterized value is xtWhen approximate target value, i be the label column item number, xt=x is yiCorresponding characteristic value, t are described
The item number of characteristic value, what constant, L and Ω were indicated is constant term;
The inside representation data and external representation data of object to be predicted are obtained, and is input to the fund raising prediction model
In, output fund raising grade corresponding with the object to be predicted;
According to the fund raising grade, corresponding with fund raising assessment report from preset fund raising grade is that table inquiry is corresponding to it
Fund raising assessment report.
Optionally, described according to the fund raising grade, from preset fund raising grade system corresponding with fund raising assessment report
Table is inquired after corresponding fund raising assessment report, further includes:
The minimum value of the fund raising prediction model is calculated, and reported based on the fund raising that minimum value judgement generates
Feasibility, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data
Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
Data acquisition module, for receiving the data analysis request of terminal transmission, and based on the data in analysis request
Object to be analyzed, obtains corresponding object data set, and the object data set includes at least representation data and outside inside object
Representation data;
Data staging module, for drawing representation data inside object and external representation data according to preset fund raising grade
Graduation, obtains at least one data subset, and the data subset and the object to be analyzed correspond;
Computing module, for according to the data subset, calculate the corresponding maximum fund raising range of the object to be analyzed with
And its maximum bearing ability grade of assets;
Prediction module, for the maximum bearing ability grade according to the maximum fund raising range and its assets, selection pair
The model training algorithm answered;The instruction of fund raising prediction is carried out at least one described data subset according to the model training algorithm
Practice, obtains fund raising prediction model, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, output is raised
Provide prediction result.
Optionally, the data analysis set-up further includes format converting module, is obtained for training the fund raising prediction mould
The data format of data set used in type, the data format include depositing for label column, the collating sequence of label column and data
Put position;It is suitable according to the sequence to label column in the internal representation data and external representation data according to the data format
Sequence is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;If the external representation data and internal picture
As there is the label column of missing in data, then increase on corresponding position in the internal representation data and external representation data
The label column of missing, and the data that fill in the blanks, to form standardized object data set;If the external representation data and inside
There are the label columns of redundancy in representation data, then will in the internal representation data and external representation data the label column of redundancy and
Its corresponding data, which is deleted or shielded from data set, is set as invalid, to form standardized object data set.
Optionally, the data analysis set-up further includes scoring modules, for passing through the weight in preset scoring model
It than coefficient, gives a mark to the data subset, obtains marking result;According to the marking as a result, being pressed to the data subset
It is ranked up according to sequence from big to small, and the forward N number of data subset that selects to give a mark is as fund raising prediction model training
Valid data collection, wherein N >=1.
Optionally, the data analysis set-up further includes derivation module, for carrying out signature analysis to the data subset,
Obtain the identical data characteristics of each data in the data subset;Feature derivation is carried out to the data characteristics, is obtained and institute
State the similar data of data in data subset, wherein the feature derivation is segmented further to the data characteristics
Either extension similar features.
Optionally, the prediction module includes model training unit and report generation unit;
The model training unit, for when being trained using the training algorithm of Light GBM model, according to described
The grade classification result of data subset matches LightGBM model training framework corresponding with the grade of the data subset, and will
The data subset is input in the model architecture and is trained, and obtains the fund raising prediction model, wherein described to raise funds in advance
Survey model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yi are after representation data carries out standardization processing
Label column label value,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data,
ftBeing characterized value is xtWhen approximate target value, i be the label column item number, xt=x is yiCorresponding characteristic value, t are described
The item number of characteristic value, what constant, L and Ω were indicated is constant term;
The report generation unit, for obtaining the inside representation data and external representation data of object to be predicted, and it is defeated
Enter into the fund raising prediction model, output fund raising grade corresponding with the object to be predicted;According to the fund raising grade, from
Preset fund raising grade corresponding with fund raising assessment report is that table inquires corresponding fund raising assessment report.
Optionally, the data analysis set-up further includes judgment module, for calculating the minimum of the fund raising prediction model
Value, and the feasibility of the fund raising report generated based on minimum value judgement, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data
Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
In addition, to achieve the above object, also a kind of data analysis equipment of the present invention, the data analysis equipment includes: to deposit
Reservoir, processor and it is stored in the data analysis program that can be run on the memory and on the processor, the number
The step of realizing data analysing method as described in any one of the above embodiments when being executed according to analysis program by the processor.
In addition, to achieve the above object, also a kind of computer readable storage medium of the present invention, the computer-readable storage
Data analysis program is stored on medium, the data analysis program is realized as described in any one of the above embodiments when being executed by processor
Data analyze the step of method of payment.
The present invention is carried out by obtaining inside representation data and the outside representation data of enterprise according to data analysis request
The planning application of fund raising data forms preliminary analysis of raising funds as a result, then polarity enterprise ecology is balanced based on the analysis results pushes away
It drills, generates the corresponding fund raising plan of enterprise, raised funds and the funding mechanism of Revenue Reconciliation, be based on forming enterprise during fund raising
The funding mechanism carries out preparing for fund, raises funds to raise funds to mismatch with income caused by planning so as to avoid the blindness of enterprise
The case where, while planning is raised to be formed based on enterprises and external data, the system greatly improved is in planning application
Precision, ensure that enterprise raise funds maximum benefit, also improve the enthusiasm that enterprise raises funds to poverty alleviation.
Detailed description of the invention
Fig. 1 is the flow diagram of data analysing method first embodiment provided by the invention;
Fig. 2 is the flow diagram of data analysing method second embodiment provided by the invention;
Fig. 3 is the functional block diagram of one embodiment of data analysis set-up provided by the invention;
Fig. 4 is the structural schematic diagram for the server running environment that the embodiment of the present invention is related to.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
In the present invention, the data analysing method provided generally refers to for realizing the poverty alleviation fund raising income to enterprise
The prediction technique of a kind of fund raising planning of balanced fund raising data schema, it is of course possible to for realizing the planning point of other business
Analysis, this method, which specifically can be through current fund raising poverty alleviation system, to be realized, it is preferred that is in existing fund raising poverty alleviation system
Increase in system and realize that the software code data of this method can be realized, the physics realization of the system can be personal computer
(PC), server, smart phone etc..Based on such hardware result, each embodiment of data analysing method of the present invention is proposed.
Referring to Fig.1, Fig. 1 is the flow chart of data analysing method provided in an embodiment of the present invention.In the present embodiment, described
Data analysing method specifically includes the following steps:
Step S110 receives the data analysis request that terminal is sent, and to be analyzed right in analysis request based on the data
As obtaining corresponding object data set;
In this step, the object data set includes at least representation data and external representation data inside object;This is right
Image data collection can specifically be obtained from existing business standing system, can also be obtained from the comment website of internet,
Be mainly used for the judgement to the resource and ability to bear of enterprise, the representation data include enterprise's ranking, business impact index,
Needed for scope of the enterprise, enterprise's annual income, the type of business, enterprise sort, enterprise's history year fund raising dynamics, fund raising classification, enterprise
Place etc. is held in propaganda strength, publicity market and some marketing or exhibition.
And enterprise's ranking includes operation situation ranking, the ranking of paying taxes, total assets ranking, credit worthiness for enterprise location
Ranking, or even the ranking that can also be profit etc. can also be the ranking for obtaining the enterprise on the whole nation according to practical application.
Step S120, by representation data inside object and external representation data according to preset fund raising grade classification grade,
At least one data subset is obtained, the data subset and the object to be analyzed correspond;
Step S130 calculates the corresponding maximum fund raising range of object to be analyzed and its assets according to the data subset
Maximum bearing ability grade;
In the present embodiment, the object refers to enterprise, and in other words the execution of step S120 and S130 is to enterprise
Industry data set carries out the planning application of fund raising data, obtains the preliminary fund raising analysis of enterprise as a result, the preliminary fund raising analysis knot
Fruit includes the maximum fund raising range of enterprise and the maximum bearing ability grade of its assets;In the planning application of the fund raising of the step
In, the assets of mainly analysis enterprise are withstood forces, and assets endurance is relatively to embody the state of development of enterprise
Data are also convenient for preparing to the future development program of enterprise, and fund raising is a kind of mode of enterprise development, and enterprise both may be implemented
The development of itself also may be implemented to help external support.
The calculating of assets endurance needs physical assets and intangible asset in conjunction with enterprise to calculate, intangible asset due to
Manage it is proper obtained from the property given of the external world, it may be said that be a kind of business standing degree, this is to guarantee enterprise in practical fund raising
When a kind of trust resource.
Maximum fund raising range and ability to bear grade are still needed to combine field involved in the enterprise itself, example
Such as the direction of enterprise's main development either produce product, need according to the type of business and its service industry come into
Row calculates, and is not that any one enterprise can optionally be raised funds in any one industry or field.
For example, judging the maximum fund raising energy that enterprise currently can bear according to the current net income of enterprise and debt situation
Power, capability-based grade raise amount to determine, on the basis of determining fund raising amount, then determine the sustainability of enterprises
Grade, the ability to bear grade can be in conjunction with factors such as enterprise current business trend, practical revenue and expenditure and the state of developments of enterprise
Comprehensively consider and is calculated.
Step S140, according to the corresponding mould of maximum bearing ability hierarchical selection of the maximum fund raising range and its assets
Type training algorithm;
In practical applications, for the selection of model training algorithm, specifically can to reply relation table by way of come
It is selected, i other words, user is previously according to practical the case where raising funds, and assets and income previously according to company etc. are because usually
The fund raising range of company is estimated, and carries out the division of grade to fund raising range, which includes basic, normal, high a variety of grades, so
After select corresponding model training algorithm, finally create a mapping table, in actual use, by with maximum fund raising model
The condition with the maximum bearing ability grade of assets as retrieval is enclosed, corresponding model training algorithm is selected from mapping table
Carry out using.
Certainly, in this step, except selecting, it can also be going through according to company except through the mode of corresponding relationship
History raises funds record to determine, such as raises according to the maximum bearing ability hierarchical search of maximum fund raising range and assets is in-company
Historical record is provided, the much the same record of grade therewith is selected, and extracts the model training algorithm in record, so that implementation model is instructed
Practice the selection of algorithm.
Step S150 carries out the instruction of fund raising prediction according to the model training algorithm at least one described data subset
Practice, obtains fund raising prediction model, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, output is raised
Provide prediction result.
In the present embodiment, object to be predicted here refers to that user needs to carry out enterprise's name of fund raising planning forecast
Claim;And object to be analyzed refers to the enterprise name for carrying out model training, can be multiple, is mainly used for obtaining trained mould
The data of type;The prediction that fund raising is realized by model is substantially the mistake deduced to the entire break-even ecology of enterprise
Journey, the deduction for implementing enterprise ecology equilibrium refer to the deduction of the degree of balance between the fund raising of enterprise and income, are according to just
Step fund raising analysis result simulation is deduced enterprise and is raised funds based on current analysis result, if be can satisfy and is being guaranteed Business survival
Under the premise of maximum fund raising saturation degree, concrete implementation mode can be by according to maximum fund raising range and assets most
Big ability to bear grade deduces the balance level for calculating fund raising and income, determines corresponding fund raising according to the balance level
Amount, the plan of raising even pre-planned or start to raise funds the preparation directly raised according to fund raising scale.
In the present embodiment, for step S120 particular by enterprise's ranking in the representation data according to enterprise, enterprise
Industry Intrusion Index, scope of the enterprise, enterprise's annual income, the type of business, enterprise sort, enterprise's history year fund raising dynamics, fund raising class
Not, the data such as propaganda strength needed for enterprise, publicity market can carry out preliminary grading to enterprise, for example be referred to according to business impact
Number, enterprise's annual income, enterprise's history year fund raising dynamics and these data of the publicity situation of enterprise carry out operation situation to enterprise
Preliminary analysis, if operation situation is good, carry out deeper into calculating analyze, be the representation data in conjunction with more enterprises
The damage analysis of carry out machine, obtains final fund raising dynamics and based under the premise of the fund raising dynamics, the maximum of the enterprise assets is withstood forces
Spend grade.
Corresponding point of planning of raising funds is determined to the assessment of an enterprise by above-mentioned mode, allows enterprise better
The operation raised funds, to improve the enthusiasm that enterprise raises funds for poverty alleviation;Also ensure that the abundant of poverty alleviation fund raising is implemented
With use.
In the present embodiment, in step s 110, object to be analyzed in prosperous request is analyzed based on the data described, obtain
After taking corresponding object data set, further includes:
The representation data of the object dataset is pre-processed, it is described pretreatment for by the representation data according to
The data format required in data analysis system formats, the object data set standardized.
In practical applications, for being substantially by the picture of the enterprise of these object datasets by data format specificationsization
As data are converted to the data of fixed format, in order to be convenient for subsequent calculating, simplify in this way to data
Processing, can be avoided mixed and disorderlyization due to data and influence to calculate as a result, also being mentioned to improve the benchmark degree of calculating
The high finally prediction to the fund raising scale of business data.
In the present embodiment, the representation data to the object dataset, which pre-process, includes:
The data format for training data set used in the fund raising prediction model is obtained, the data format includes
The storage position of label column, the collating sequence of label column and data;
According to the data format to label column in the internal representation data and external representation data according to the sequence
Sequence is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;
If there is the label column of missing in the external representation data and internal representation data, in the internal number of drawing a portrait
According to increase the label column of missing on corresponding position in external representation data, and the data that fill in the blanks are standardized to be formed
Object data set;
If there are the label columns of redundancy in the external representation data and internal representation data, by the internal number of drawing a portrait
According to redundancy in external representation data label column and its corresponding data delete or shield from data set and be set as invalid,
To form standardized object data set.
It is to be arranged by shielding label, and arrange missing the Missing Data Filling method and case line for carrying out tree-model prediction
Figure abnormal test etc., standardizes to initial data.It does not add, but data cleansing is carried out to initial data, by one
The method of a little data cleansings.It is label column that shielding label column, which are because of label column, other than carrying out model training and verifying,
We are arranged heart as small as possible using label, because this column data is extremely important.Therefore Missing Data Filling is being carried out with tree-model
When, we prefer to this column to remove, the influence of this columns value is reduced, and only consider other representation datas.Therefore
It is not to be arranged using label, but data combing and Feature Engineering will shield label column in many cases.
In practical applications, to the general format of representation data collection got be all can be compare specification pass through data
The data of table storage, and can just have and be provided with when either some statistics companies export for enterprise for data form
Various label column labels, and it is unwanted in the prediction of the fund raising planning of these label column in this application, in this regard, here
Pretreatment, specifically can be by will shield obtain representation data in label column (label column), or pass through inspection picture
As the mode of missing or redundancy in data to carry out form modifying to data.
Such as: if the data checked have the case where missing, selection is by way of increasing label column to portrait number
According to the increase for carrying out empty parameter, so that representation data meets preset call format, preferably can choose by scarce
The modes such as Missing Data Filling method and the box traction substation abnormal test of column progress tree-model prediction are lost to standardize representation data
Change processing;If to check data be redundancy, data delete by way of shielding label column and reject redundancy letter
Breath.
Same format analysis processing is carried out to initial data by above-mentioned mode, the standardization of data is realized, can keep away
Exempt from the format diversification due to data, subsequent grade assessment is caused deviation occur.
In the present embodiment, the object data set that the basis is got carries out the planning application of fund raising data, obtains
To enterprise preliminary fund raising analysis result include:
It will carry out the pretreated representation data and carry out data progression process according to preset grade, much a data
Subset, the data subset and the enterprise correspond;
The maximum of corresponding maximum the fund raising range and assets of enterprise in the representation data is calculated according to the data subset
Ability to bear grade.
The representation data is substantially also a data set, including external representation data and internal representation data, tool
Body can divide bucket to realize and carry out data staging to representation data collection, in practical applications, by advance for every by feature
Corresponding data characteristics is arranged in a grade, and in classification, pass through the data characteristics pre-set and the representation data collection
In data characteristics be compared can be realized the classification to the representation data collection processing.
In practical applications, the corresponding data characteristics of each grade is corresponding from different fund raising grades, such as
Fund raising grade classification to enterprise is 10 grades, the representation data when being classified processing, after being converted into prescribed form first
In keyword be compared respectively with the data characteristics of 10 grades, number of degrees is then determined according to the result of comparison, it is false
If the representation data is the business data for enumerating the uniform type of the enterprise of a period of time length, first when being classified processing
Representation data is segmented according to small time interval first, is obtained to a multiple small sets, then by each small set respectively at
The corresponding data characteristics of 10 grades is compared, when determining that data characteristics reaches the corresponding numerical value of the grade, then by the small set
It is divided into the grade, after the completion of comparing, the small set is formed at least one data subset.
Below with data such as continuous annual incomes that the representation data collection got is enterprise, we can carry out data point
Grade, continuous annual income is for example divided, obtain multiple small sets according to the time, and it is exactly annual income that each grade is corresponding
The average income amount of money, the average value amount of money corresponding with grade in small set is compared, comparison result is in 1000W or more
Corresponding factor variable is 10 (they being 10 grades), and 500W or more is 8, and so on.
Certainly there is some data not necessarily amount of money only to facilitate understanding the example for the amount of money enumerated above,
There are also it is some be text type data, be also required to carry out text matches are carried out in a manner mentioned above after to be classified or from
Dispersion, only text may be compared for specific crucial system.It certainly, is according to reality for above-mentioned grade classification
The conditions of the enterprise of fund raising divided, such as the current assessment of participating in raising funds enterprise overall strength it is relatively high when,
Its grade can divide few point, and the requirement of each grade is just relatively high;When participate in raise funds assessment enterprise overall strength all
When relatively low, grade classification is with regard to multiple spot, and the requirement of inferior grade is also relatively low, can mainly mention for some small enterprises
For facilitating the generalization of poverty alleviation.
Certainly, the above-mentioned some hierarchy models of classification are realized with classification is handled, and the training of analysis model is to pass through
The data standardized are trained, and are in other words directly to be instructed using the data label carrying out model training
Practice, is also possible to be realized according to the data characteristics in classification.
In the present embodiment, described by representation data inside object and external representation data according to preset fund raising grade
Divided rank, after obtaining at least one data subset, further includes:
Subset carries out signature analysis based on the data, and it is special to extract the identical data of each data in the data subset
Sign;
The derivation of data characteristics is carried out according to the data characteristics, it is similar to the data in the data subset to expand
Data, wherein the derivation refers to doing the data characteristics further subdivision either extension similar features, thus
So that more accurate to the division of representation data collection.
In practical applications, it does not need to carry out derivation to all data characteristicses, can be only to a part therein
It carries out, derivation specifically can be preferably special to preset ranked data according to the determination of the actual classification of the data to enterprise
Sign carries out derivation, it is assumed that carries out derivative differentiation to a certain column feature, feature is for example changed to one-hot type etc..It is not every
A data will be done, but be screened in an orderly manner according to the characteristic of data itself, these are all the processes of experiment.We can take
The characteristic processing method of a variety of different representation datas is tested, and is based ultimately upon result to select optimal Feature Engineering to calculate
Method.
Further, in the step of carrying out feature derivation, concrete implementation process are as follows: first in the comparison being classified
Processing by also judging whether that corresponding derivation can be carried out to each data characteristics, derivation here or base certainly
Derivation is carried out in the data characteristics of ad eundem, if derivation can be carried out, this is according to the type and scene of the data characteristics in grade
It is verified, the reasonability in conjunction with representation data itself is also needed to carry out derivation certainly during derivation, it can not be beyond enterprise
The practical development of the representation data of industry carries out derivation, if excessive derivation will lead to the deviation of subsequent training pattern, so as to enterprise
Inaccuracy is assessed in the fund raising of industry.For example, the current type of data characteristics is A, there is B type with type similar in type-A, then combine
To representation data development judge whether can with derivation to similar B type, if can if carry out derivation, and obtain the B
Feature of other features as this aspect ratio pair under type, to extend the quantity of feature, while also meeting enterprise
It deduces and requires, the accuracy of the assessment greatly improved.
In the present embodiment, for particular-trade derivation, it includes that feature differentiation and signature search add two kinds, wherein for spy
Sign differentiation, specific implementation, which can be, selects corresponding differentiation method according to the type of data characteristics or the classification of data subset,
Each of splitting, but split out small feature to each data characteristics based on the differentiation method is and original data
Feature belongs to same type or has the same meaning the phrase of the meaning.
Signature search is added, specific implementation can be by according to semanteme of the data characteristics in data subset come group
Then word selects the data belonged in data subset to unify class to obtain more similar characteristics from these similar special cards
Another characteristic.
It in the present embodiment, can also be by according to pair that get in generating the step of thigh recommends fund raising plan
Image data collection carries out the mode of model training to carry out the generation of fund raising plan, obtains raising for a deduction particular by training
Model is provided, corresponding business data is inputted based on this model and is predicted i.e. exportable corresponding fund raising plan.
In practical applications, model training is carried out according to the representation data after data progression process, to obtain prediction of raising funds
Model.
In the present case, the training of the model is passed through particular by the known enterprise's representation data got
Enterprise itself or expert beat label to exporting corresponding data form.For example A enterprise, portrait dimension from f1-fn,
Label is 1, there is enterprise's fund raising, propaganda strength, the mapping for publicizing market;Equally, B, f1-fn, label are 2, with such
It pushes away, we collect to obtain the data set that a part has known label, and it is for training pattern, from these that we, which build model,
Label data concentration removes learning law, such as simplest linear model, and y=a1f1+a2f2+a3f3 ..., we pass through number of tags
According to training pattern is gone, the numerical value of a1, a2, a3 are obtained, wherein f (n) is representation data, and obtained a (n) is for pattern function
Number.
In the present case, other than being trained deduction using above-mentioned linear model, preferred selection is used
LightGBM model is trained, and the training of this kind of model uses segmentation and selection of the gradient to characteristic, can reduce data
The calculating of amount greatly improves the efficiency of training pattern, and the training for the model is specific as follows, first by object data set into
Row is divided into multiple data subsets, and carries out input training based on data subset.
Input: training data, iterative steps d, the sample rate a of big gradient data, the sample rate b of small gradient data, loss
The type (generally decision tree) of function and several learners;
Output: trained strong learner;
(1) descending sort is carried out to them according to the absolute value of the gradient of sample point;
(2) sample of a*100% generates the subset of one big gradient sample point before choosing to the result after sequence;
(3) to the sample of remaining sample set (1-a) * 100%, random * 100% sample point of selection b* (1-a),
Generate the set of one small gradient sample point;
(4) big gradient sample and the small gradient sample of sampling are merged;
(5) small gradient sample is multiplied by weight coefficient (1-a)/b;
(6) using the sample of above-mentioned sampling, learn a new weak learner;
(7) (1)~(6) step is repeated continuously until reaching defined the number of iterations or convergence.
While learner precision can be lost under the premise of not changing data distribution by algorithm above greatly
Reduce the rate of model learning.
In the present embodiment, fund raising prediction is carried out at least one described data subset according to the model training algorithm
Training obtains fund raising prediction model, and predicts that output is raised to the ecological engineering to object based on the fund raising prediction model
Providing prediction result includes:
When being trained using the training algorithm of Light GBM model, according to the grade classification knot of the data subset
Fruit matches Light GBM model training framework corresponding with the grade of the data subset, and the data subset is input to
It is trained in the model architecture, obtains the fund raising prediction model, wherein the fund raising prediction model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yiAfter carrying out standardization processing for representation data
Label column label value,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data,
ftBeing characterized value is xtWhen approximate target value, i be the label column item number, xt=x yiCorresponding characteristic value, t are described
The item number of characteristic value, what constant, L and Ω were indicated is constant term;
The inside representation data and external representation data of object to be predicted are obtained, and is input to the fund raising prediction model
In, output fund raising grade corresponding with the object to be predicted;
According to the fund raising grade, corresponding with fund raising assessment report from preset fund raising grade is that table inquiry is corresponding to it
Fund raising assessment report.
Described according to the fund raising grade, from preset fund raising grade it is corresponding with fund raising assessment report be table inquiry with
Corresponding fund raising assessment report after, further includes:
The minimum value of the fund raising prediction model is calculated, and reported based on the fund raising that minimum value judgement generates
Feasibility, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data
Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
Specifically, for obtaining fund raising prediction model to the object data set training got using Light GBM model
Realization process it is specific as follows:
Assuming that object data set has n instance X 1 ..., Xn { X1 ..., Xn } X1 ..., Xn, characteristic dimension s.Every subgradient
When repeatedly, the negative gradient direction of the loss function of model data variable is expressed as g1 ..., gn, and decision tree passes through optimal cut-off (most
Big information gain point) data are assigned into each node, then the data by these after dividing pass through the default of LightGBM model
It is trained in model architecture, to obtain final fund raising planning forecast model, the function formula of the model is as follows:
Wherein, yiThe label value of label column after carrying out standardization processing for representation data data;ft(Xt) it is to acquisition
The approximate calculation function of characteristic value in representation data;XtFor yiCorresponding characteristic value;J is the segmentation feature of representation data;S is
Cut-point;What constant was indicated is constant term.
Further, Taylor expansion is carried out by above-mentioned fund raising planning forecast model to calculate to obtain based on enterprise's fund raising
The approximate target value drawn, calculation formula are as follows:
Then, the functional minimum value for acquiring above-mentioned model, judge the fund raising plan of enterprise based on the minimum value can
Row, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data
Value;ciFor with yiCorresponding absolute value.
LightGBM finds division gain maximum (one using leaf-wise growth strategy from current all leaves every time
As be also that data volume is maximum) a leaf, then divide, so recycle;But deep decision tree can be grown, was generated
Fitting (therefore LightGBM increases the limitation of a depth capacity on leaf-wise, it is efficient while anti-in guarantee
Only over-fitting).LightGBM optimizes the support to category feature, can directly input category feature, does not need additional 0/1
Expansion.And the decision rule of category feature is increased in decision Tree algorithms.Dispersion specification is used in data parallel
(Reducescatter) task that histogram merges is shared different machines, reduce communication and calculated, and utilize histogram
It makes the difference, further reduces the traffic of half.Data parallel (ParallelVoting) based on ballot then advanced optimizes
Communication cost in data parallel makes communication cost become constant rank.
In summary, LightGBM model has good robustness, can prevent over-fitting well, and adds again in performance
Speed optimization, so that arithmetic speed is faster, memory consumption is lower, this is also that we select the important original of this model of LightGBM
Cause.
According to maximum fund raising range and the maximum bearing ability grade of its assets, generated in conjunction with fund raising plan forecast model
Corresponding fund raising scale.
In the present embodiment, which can be instructed in advance by largely raising funds layout data
It gets, this certain model is also possible to establish by demand theory, then constantly trains in actual prediction application
It obtains, to improve the precision of model.
Further, described by representation data inside object and external representation data according to preset fund raising grade classification
Grade, after obtaining at least one data subset, further includes:
It is given a mark by the weight ratio coefficient in scoring model at least one described data subset;
Select the higher data subset of marking as the fund raising from least one described data subset according to marking result
The valid data collection of prediction model training.
In practical applications, in training prediction model, marking screening is carried out to data by introducing weight ratio, certainly
It also can be set in data prediction the step of and realize to weight ratio, it in this way can also be in advance to the screening of data, specifically
The specific targets such as known business fund raising, propaganda strength, publicity market can be based on by realizing in conjunction with scoring model,
Wherein, it is believed that enterprise raises funds and propaganda strength is most important marking index, it will be assumed that weight is respectively set as 0.3, therefore
The two indexs reach weight 0.6, remaining index mean allocation weight and weight are cumulative and are 0.4.Secondly, we are based on respectively
After the specific value of index is normalized, then it is weighted.Final each enterprise we can quantify to obtain a l
Specific label numerical value.More than, the combing of label column finishes the pretreatment of data (be finish).
It in the present embodiment, can be with except the prediction except through the mode of above-mentioned model to realize fund raising scale
Selection is simply simply sorted out with some analysis mechanisms, is that comparison is not applicable in the biggish situation of data volume,
And there is also certain loopholes for the preciseness of logic.But it if there is resource, the limitation of time, can be drawn a portrait with extraction section
Data carry out simple classification, and by carrying out some correlation tests to label data, the higher representation data of correlation, which is used as, to be drawn
Point foundation, in this way and a kind of simple and practical scheme.
As shown in Fig. 2, being trained for the embodiment of the present invention based on LightGBM model, the data analysis side for prediction of raising funds
The specific implementation flow chart of method, this method specifically includes the following steps:
Step S210, by the communication connection of internet, obtained from data system relevant to enterprise and website into
The representation data of the enterprise of row fund raising planning forecast;
In this step, the representation data of acquisition is specifically enterprise's ranking, business impact index, scope of the enterprise, enterprise year
Propaganda strength needed for income, the type of business, enterprise sort, enterprise's history year fund raising dynamics, fund raising classification, enterprise, publicity city
Field, host city etc..
Step S220 carries out the processing of data format and feature derivation to the representation data;
In this step, the data that will acquire first first carry out tabular processing, are the difference according to data in table
Generation new line label, which is referred in data form, in lattice is stored, but the data information that the data got are not all
It is all enabled, may exist it is some do not need or the information of redundancy, at this moment again by shielding label column, and to missing arrange into
The Missing Data Filling method of row tree-model prediction and box traction substation abnormal test etc., standardize to representation data.
Further, a point bucket is carried out to the characteristic in all kinds of representation datas, such as to enterprise's annual income consecutive numbers
According to progress stepping counting, and correspond to specific value.Year-on-year to the progress such as enterprise's net income, debt, ring ratio feature is derivative.
In this step, the number that the data of not type label can also be arranged in such a way that label arranges construction
According in table, specifically the enterprises such as different enterprise's fund raisings, propaganda strength, publicity market, host city are drawn a portrait, pass through scoring model
It combs out and quantifies label.
In the present embodiment, can also be classified according to the property difference of data itself to carry out feature.If it is even
Continue data, for example the data such as enterprise's annual income, we can carry out data staging, and for example the corresponding factor of 1000W or more becomes
Amount is 10, and 500W or more is 8, and so on.There are the data of some text types also to carry out being classified after text matches
Or discretization.
Step S230, the training based on Light GBM model to treated representation data carries out fund raising prediction model.
In practical applications, processing is split to the representation data first by way of cutting, obtained to number
It according to subset, and determines the cut-point of each subset, is carried out based on cut-point combination subset using Light GBM model training, example
The cutting strategy for such as using Light GBM, is illustrated, being exactly will be red, yellow, blue, green right by taking red, yellow, green, blue color set as an example
The four class samples answered are divided into all possible strategies of two classes, such as: reddish yellow is a kind of, bluish-green one kind.Kind of a strategy is so just had, this
Sample could adequately excavate the information that the dimensional feature is included, and find optimal segmentation strategy.But optimum segmentation is found in this way
The time complexity of strategy will be very big.There is a effective solution scheme for regression tree.In order to find optimal division needs
About.Basic idea is to be reordered according to the correlation of training objective to classification.More specifically, according to accumulated value
() is again ranked up (category feature) histogram, and best cut-point is then found in sorted histogram.Base
In the training framework that data set is input to Light GBM model by the cut-point, following model formation is obtained:
Taylor expansion is carried out based on above-mentioned fund raising planning forecast model to calculate to obtain the approximation of enterprise's fund raising plan
Target value, calculation formula are as follows:
Then, the functional minimum value for acquiring above-mentioned model, judge the fund raising plan of enterprise based on the minimum value can
Row, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data
Value.
S240 obtains the current data of enterprise, is output in fund raising prediction model, and export prediction raises program results.
It further, further include introducing weight ratio to come to the number after segmentation when being split processing training pattern a little
Marking screening is carried out according to subset, also can be set in data prediction the step of and realize to weight ratio certainly, in this way may be used
Can specifically be raised funds based on known business, publicity power by being realized in conjunction with scoring model in advance to the screening of data
The specific targets such as degree, publicity market, wherein it is considered that enterprise raises funds and propaganda strength is most important marking index, we
Assuming that weight is respectively set as 0.3, therefore the two indexs reach weight 0.6, remaining index mean allocation weight and weight are cumulative
Be 0.4.Secondly, after our specific values based on each index are normalized, then be weighted.Final each enterprise
We can quantify to obtain a l specific label numerical value.More than, the combing of label column finishes, and is then carrying out model
Training, which further increases the accuracy of the prediction result of model.
In order to solve the problem above-mentioned, the present invention also provides a kind of data analysis equipment, which can be used
In realizing data analysing method provided in an embodiment of the present invention, physics realization exists in the manner of a server, the server
Particular hardware is realized as shown in Figure 1.
Referring to Fig. 3, which includes: processor 301, such as CPU, communication bus 302, user interface 303, network
Interface 304, memory 305.Wherein, communication bus 302 is for realizing the connection communication between these components.User interface 303
It may include display screen (Display), input unit such as keyboard (Keyboard), network interface 304 optionally may include
Standard wireline interface and wireless interface (such as WI-FI interface).Memory 305 can be high speed RAM memory, be also possible to steady
Fixed memory (non-volatile memory), such as magnetic disk storage.Memory 305 optionally can also be independently of
The storage device of aforementioned processor 301.
It will be understood by those skilled in the art that the analysis of structure paired data does not fill the hardware configuration of equipment shown in Fig. 3
The restriction set may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in figure 3, as may include operating system, net in a kind of memory 305 of computer readable storage medium
Network communication module, Subscriber Interface Module SIM and be based on data analysis program.Wherein, operating system is management and data analysis set-up
With the program of software resource, the operation of branch data analysis program and other softwares and/or program.
In the hardware configuration of server shown in Fig. 3, network interface 104 is mainly used for accessing network;User interface 103
Be mainly used for either being communicated with the server for providing business data with extraneous internet, transfer for enterprise it is various
Credit and assets information, and processor 301 can be used for calling the data analysis program stored in memory 305, and execute with
The operation of each embodiment of lower data analysing method.
In this big bright embodiment, a kind of mobile phone etc. can be with the mobile end of touch control operation can also be for the realization of Fig. 3
End, the processor of the mobile terminal is stored in buffer or storage unit by reading may be implemented data analysing method
Program code carry out deduction prediction come the fund raising plan for carrying out to enterprise.
In order to solve the problem above-mentioned, the embodiment of the invention also provides a kind of data analysis set-ups, are referring to Fig. 4, Fig. 4
The schematic diagram of the functional module of data analysis set-up provided in an embodiment of the present invention.In the present embodiment, which includes:
Data acquisition module 41, for receiving the data analysis request of terminal transmission, and analysis request based on the data
In object to be analyzed, obtain corresponding object data set, the object data set includes at least inside object representation data and outer
Portion's representation data;
Data staging module 42 is used for representation data inside object and external representation data according to preset fund raising grade
Divided rank, obtains at least one data subset, and the data subset and the object to be analyzed correspond;
Computing module 43, for calculating the corresponding maximum fund raising range of the object to be analyzed according to the data subset
With the maximum bearing ability grade of its assets;
Prediction module 44, for the maximum bearing ability grade according to the maximum fund raising range and its assets, selection
Corresponding model training algorithm;The training for carrying out fund raising prediction to the data subset according to the model training algorithm, obtains
Fund raising prediction model, and predict that output raises funds to predict knot based on ecological engineering of the fund raising prediction model to object to be predicted
Fruit.
In the present embodiment, the data analysis set-up further includes format converting module, for the object data set
In representation data pre-processed, it is described pretreatment for by the representation data according to the data required in data analysis system
Format formats, the object data set standardized.
In the present embodiment, described device further includes judgment module, for calculating the minimum value of the fund raising prediction model,
And the feasibility of the recommendation fund raising plan generated based on minimum value judgement.
Based on embodiment description identical with the data analysing method of the embodiments of the present invention, therefore the present embodiment
The embodiment content of data analysis set-up is not done and is excessively repeated.
The present embodiment obtains inside representation data and the outside representation data of enterprise according to data analysis request to be raised
The planning application of data is provided, forms preliminary analysis of raising funds as a result, the then deduction of polarity enterprise ecology equilibrium based on the analysis results,
The corresponding fund raising plan of enterprise is generated, is raised funds and the funding mechanism of Revenue Reconciliation with forming enterprise during fund raising, being based on should
Funding mechanism carries out preparing for fund, unmatched with income so as to avoid raising funds caused by the blindness fund raising planning of enterprise
Situation, while raising planning based on enterprises and external data to be formed, the system greatly improved is in planning application
Precision ensure that the maximum benefit that enterprise raises funds, also improve the enthusiasm that enterprise raises funds to poverty alleviation.
The present invention also provides a kind of computer readable storage mediums.
In the present embodiment, data analysis program is stored on the computer readable storage medium, the H5 webpage is swept
Code payment program realizes the step of data analysing method as described in the examples such as any of the above-described when being executed by processor.Its
In, the method realized when data analysis program is executed by processor can refer to each implementation of data analysing method of the present invention
Example, therefore no longer excessively repeat.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set
It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly
Other related technical areas are used in, all of these belong to the protection of the present invention.
Claims (10)
1. a kind of data analysing method, which is characterized in that the data analysing method includes:
The data analysis request that terminal is sent, and object to be analyzed in analysis request based on the data are received, is obtained corresponding
Object data set, the object data set include at least representation data and external representation data inside object;
By representation data inside object and external representation data according to preset fund raising grade classification grade, at least one number is obtained
According to subset, the data subset and the object to be analyzed are corresponded;
According to the data subset, calculates the corresponding maximum fund raising range of object to be analyzed and the maximum of its assets is born
Ability rating;
According to the maximum fund raising range and the maximum bearing ability grade of its assets, corresponding model training algorithm is selected;
The training for carrying out fund raising prediction to the data subset according to the model training algorithm obtains fund raising prediction model, and
Ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, exports fund raising prediction result.
2. data analysing method as described in claim 1, which is characterized in that in the analysis request based on the data to
After the step of analyzing object, obtaining corresponding object data set, further includes:
The data format for training data set used in the fund raising prediction model is obtained, the data format includes label
The storage position of column, the collating sequence of label column and data;
According to the data format to label column in the internal representation data and external representation data according to the collating sequence
It is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;
If there is the label column of missing in the external representation data and internal representation data, in the internal representation data and
Increase the label column of missing, and the data that fill in the blanks in external representation data on corresponding position, to form standardized object
Data set;
If there are the label column of redundancy in the external representation data and internal representation data, by the internal representation data and
The label column of redundancy and its corresponding data, which are deleted or shielded from data set, in external representation data is set as invalid, with shape
At standardized object data set.
3. data analysing method as claimed in claim 2, which is characterized in that described by representation data inside object and outside
Representation data is according to preset fund raising grade classification grade, after obtaining at least one data subset, further includes:
It by the weight ratio coefficient in preset scoring model, gives a mark to the data subset, obtains marking result;
According to the marking as a result, being ranked up to the data subset according to sequence from big to small, and it is forward to select to give a mark
Valid data collection of N number of data subset as fund raising prediction model training, wherein N >=1.
4. data analysing method as described in any one of claims 1-3, which is characterized in that described by number of drawing a portrait inside object
According to after according to preset fund raising grade classification grade, obtaining at least one data subset with external representation data, further includes:
Signature analysis is carried out to the data subset, obtains the identical data characteristics of each data in the data subset;
Feature derivation is carried out to the data characteristics, obtains data similar with the data in the data subset, wherein described
Feature derivation is that further subdivision either extension similar features are done to the data characteristics.
5. data analysing method as claimed in claim 4, which is characterized in that it is described according to the model training algorithm to described
Data subset carries out the training of fund raising prediction, obtains fund raising prediction model, and is based on the fund raising prediction model to described to right
The ecological engineering of elephant predicts that output fund raising prediction result includes:
When being trained using the training algorithm of Light GBM model, according to the grade classification result of the data subset
With Light GBM model training framework corresponding with the grade of the data subset, and the data subset is input to described
It is trained in model architecture, obtains the fund raising prediction model, wherein the fund raising prediction model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yiMark after carrying out standardization processing for representation data
The label value of column is signed,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data, ftFor
Characteristic value is xtWhen approximate target value, i be the label column item number, xT=X is yiCorresponding characteristic value, t are the feature
The item number of value, what constant, L and Ω were indicated is constant term;
The inside representation data and external representation data of object to be predicted are obtained, and is input in the fund raising prediction model, it is defeated
Fund raising grade corresponding with the object to be predicted out;
According to the fund raising grade, corresponding with fund raising assessment report from preset fund raising grade is that table inquires corresponding raise
Provide assessment report.
6. data analysing method as claimed in claim 5, which is characterized in that described according to the fund raising grade, from default
Fund raising grade it is corresponding with fund raising assessment report be that table is inquired after corresponding fund raising assessment report, further includes:
The minimum value of the fund raising prediction model is calculated, and the fund raising report based on minimum value judgement generation is feasible
Property, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively that dimension section in representation data takes
Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
7. a kind of data analysis set-up, which is characterized in that the data analysis set-up includes:
Data acquisition module, for receiving the data analysis request of terminal transmission, and based on the data in analysis request wait divide
Object is analysed, corresponding object data set is obtained, the object data set includes at least representation data and external portrait inside object
Data;
Data staging module is used for representation data inside object and external representation data according to preset fund raising grade classification etc.
Grade, obtains at least one data subset, and the data subset and the object to be analyzed correspond;
Computing module, for according to the data subset, calculate the corresponding maximum fund raising range of the object to be analyzed and its
The maximum bearing ability grade of assets;
Prediction module selects corresponding for the maximum bearing ability grade according to the maximum fund raising range and its assets
Model training algorithm;The training for carrying out fund raising prediction to the data subset according to the model training algorithm obtains raising funds pre-
Model is surveyed, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, exports fund raising prediction result.
8. data analysis set-up as claimed in claim 7, which is characterized in that the prediction module include model training unit and
Report generation unit;
The model training unit, for when being trained using the training algorithm of Light GBM model, according to the data
The grade classification result of subset matches Light GBM model training framework corresponding with the grade of the data subset, and by institute
It states data subset and is input in the model architecture and be trained, obtain the fund raising prediction model, wherein the fund raising prediction
Model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yi are that representation data carries out the mark after standardization processing
The label value of column is signed,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data, ftFor
Characteristic value is xtWhen approximate target value, i be the label column item number, xt=x is yiCorresponding characteristic value, t are the feature
The item number of value, what constant, L and Ω were indicated is constant term;
The report generation unit for obtaining the inside representation data and external representation data of object to be predicted, and is input to
In the fund raising prediction model, output fund raising grade corresponding with the object to be predicted;According to the fund raising grade, from default
Fund raising grade it is corresponding with fund raising assessment report be that table inquires corresponding fund raising assessment report.
9. a kind of data analysis equipment, which is characterized in that the data analysis equipment includes: memory, processor and storage
On the memory and the data analysis program that can run on the processor, the data analysis program is by the processing
It realizes when device executes such as the step of data analysing method of any of claims 1-6.
10. a kind of computer readable storage medium, which is characterized in that be stored with data point on the computer readable storage medium
Program is analysed, realizes that data of any of claims 1-6 such as are analyzed when the data analysis program is executed by processor
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910479378.7A CN110310012B (en) | 2019-06-04 | 2019-06-04 | Data analysis method, device, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910479378.7A CN110310012B (en) | 2019-06-04 | 2019-06-04 | Data analysis method, device, equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110310012A true CN110310012A (en) | 2019-10-08 |
CN110310012B CN110310012B (en) | 2023-07-28 |
Family
ID=68075289
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910479378.7A Active CN110310012B (en) | 2019-06-04 | 2019-06-04 | Data analysis method, device, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110310012B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956303A (en) * | 2019-10-12 | 2020-04-03 | 未鲲(上海)科技服务有限公司 | Information prediction method, device, terminal and readable storage medium |
CN112598341A (en) * | 2021-03-08 | 2021-04-02 | 工福(北京)科技发展有限公司 | Data processing system and method for idle article poverty alleviation platform |
CN114298427A (en) * | 2021-12-30 | 2022-04-08 | 北京金堤科技有限公司 | Enterprise attribute data prediction method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149821A (en) * | 2007-05-15 | 2008-03-26 | 佟辛 | Complete information based dynamically interactive type enterprise finance model construction and operation method |
US20140136294A1 (en) * | 2012-11-13 | 2014-05-15 | Creat Llc | Comprehensive quantitative and qualitative model for a real estate development project |
CN105719069A (en) * | 2016-01-15 | 2016-06-29 | 中国南方电网有限责任公司电网技术研究中心 | Method and system for measuring enterprise cash flows |
KR101750825B1 (en) * | 2016-04-01 | 2017-06-26 | 주식회사 조이펀 | A crowd funding platform system for securing investors by continuously providing objective investing information and the funding method by using the same |
CN107767259A (en) * | 2017-09-30 | 2018-03-06 | 平安科技(深圳)有限公司 | Loan risk control method, electronic installation and readable storage medium storing program for executing |
-
2019
- 2019-06-04 CN CN201910479378.7A patent/CN110310012B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101149821A (en) * | 2007-05-15 | 2008-03-26 | 佟辛 | Complete information based dynamically interactive type enterprise finance model construction and operation method |
US20140136294A1 (en) * | 2012-11-13 | 2014-05-15 | Creat Llc | Comprehensive quantitative and qualitative model for a real estate development project |
CN105719069A (en) * | 2016-01-15 | 2016-06-29 | 中国南方电网有限责任公司电网技术研究中心 | Method and system for measuring enterprise cash flows |
KR101750825B1 (en) * | 2016-04-01 | 2017-06-26 | 주식회사 조이펀 | A crowd funding platform system for securing investors by continuously providing objective investing information and the funding method by using the same |
CN107767259A (en) * | 2017-09-30 | 2018-03-06 | 平安科技(深圳)有限公司 | Loan risk control method, electronic installation and readable storage medium storing program for executing |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956303A (en) * | 2019-10-12 | 2020-04-03 | 未鲲(上海)科技服务有限公司 | Information prediction method, device, terminal and readable storage medium |
CN112598341A (en) * | 2021-03-08 | 2021-04-02 | 工福(北京)科技发展有限公司 | Data processing system and method for idle article poverty alleviation platform |
CN114298427A (en) * | 2021-12-30 | 2022-04-08 | 北京金堤科技有限公司 | Enterprise attribute data prediction method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110310012B (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Feng et al. | An expert recommendation algorithm based on Pearson correlation coefficient and FP-growth | |
Tsui et al. | Knowledge-based extraction of intellectual capital-related information from unstructured data | |
Chatterjee et al. | Supplier selection in Telecom supply chain management: a Fuzzy-Rasch based COPRAS-G method | |
CN103761254B (en) | Method for matching and recommending service themes in various fields | |
CN106095942B (en) | Strong variable extracting method and device | |
CN110288350A (en) | User's Value Prediction Methods, device, equipment and storage medium | |
CN106484813A (en) | A kind of big data analysis system and method | |
CN110222733A (en) | The high-precision multistage neural-network classification method of one kind and system | |
CN114860916A (en) | Knowledge retrieval method and device | |
CN113537807A (en) | Enterprise intelligent wind control method and device | |
CN114861050A (en) | Feature fusion recommendation method and system based on neural network | |
CN117271767A (en) | Operation and maintenance knowledge base establishing method based on multiple intelligent agents | |
CN110310012A (en) | Data analysing method, device, equipment and computer readable storage medium | |
Yiping et al. | An improved multi-view collaborative fuzzy C-means clustering algorithm and its application in overseas oil and gas exploration | |
CN115145993A (en) | Railway freight big data visualization display platform based on self-learning rule operation | |
Li et al. | Drilling down artificial intelligence in entrepreneurial management: A bibliometric perspective | |
Minhas et al. | An efficient algorithm for ranking candidates in e-recruitment system | |
Liu et al. | Multi-task learning based high-value patent and standard-essential patent identification model | |
CN107424026A (en) | Businessman's reputation evaluation method and device | |
Li | Consumer behavior analysis model based on machine learning | |
CN112506930B (en) | Data insight system based on machine learning technology | |
CN115293867A (en) | Financial reimbursement user portrait optimization method, device, equipment and storage medium | |
Augenstein et al. | Towards Value Proposition Mining" Exploration of Design Principles. | |
Alnoukari | ASD-BI: An agile methodology for effective integration of data mining in business intelligence systems | |
CN115619238B (en) | Method for establishing inter-enterprise cooperative relationship for non-specific B2B platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |