CN110310012A - Data analysing method, device, equipment and computer readable storage medium - Google Patents

Data analysing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110310012A
CN110310012A CN201910479378.7A CN201910479378A CN110310012A CN 110310012 A CN110310012 A CN 110310012A CN 201910479378 A CN201910479378 A CN 201910479378A CN 110310012 A CN110310012 A CN 110310012A
Authority
CN
China
Prior art keywords
data
fund raising
grade
model
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910479378.7A
Other languages
Chinese (zh)
Other versions
CN110310012B (en
Inventor
陈娴娴
阮晓雯
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910479378.7A priority Critical patent/CN110310012B/en
Publication of CN110310012A publication Critical patent/CN110310012A/en
Application granted granted Critical
Publication of CN110310012B publication Critical patent/CN110310012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/90Financial instruments for climate change mitigation, e.g. environmental taxes, subsidies or financing

Abstract

The present invention relates to field of artificial intelligence, disclose a kind of data analysing method, it include: that inside representation data and the outside representation data of object are obtained according to data analysis request to carry out grade classification, and calculate the maximum fund raising range and assets ability to bear grade of object, then the training that fund raising prediction model is carried out according to maximum fund raising range and the corresponding model training algorithm of assets ability to bear hierarchical selection, business data and model based on object to be predicted export fund raising prediction result.The invention also discloses a kind of data analysis set-up, equipment and computer readable storage mediums, the present invention is based on the funding mechanisms to carry out preparing for fund, it raises funds to raise funds and the unmatched situation of income caused by planning so as to avoid the blindness of enterprise, planning is raised to be formed based on enterprises and external data simultaneously, precision of the system greatly improved in planning application, it ensure that the maximum benefit that enterprise raises funds, also improve the enthusiasm that enterprise raises funds to poverty alleviation.

Description

Data analysing method, device, equipment and computer readable storage medium
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of data analysing method, device, equipment and computers Readable storage medium storing program for executing.
Background technique
With the continuous development of current artificial intelligence, especially in the data statistics and plan of operation of enterprise, manually Intelligence can save many human resources to enterprise, and in current technology, for the system of the planning application of enterprise, by It is to be arranged in the inside of enterprise, and require the confidentiality of data in system, therefore the connection that generally can not carry out outer net is It unites when carrying out the analysis of data, the data usually using analysis are the historical datas inside enterprise's current year to be planned, And still avail data, because external information needs are constantly imported from external network, so that the update of data is simultaneously Not in time, so as to cause the difference and inaccuracy of analysis;Simultaneity factor is also advised greatly there is no excessive progress in analysis The assets of mould and the analysis of maximum capacity, itself planning that will lead to enterprise in this way can be easy to move towards to polarise, otherwise It exactly satiates, otherwise is exactly superfluous, and the existence of enterprise can be leveraged by satiating, and the superfluous development that will limit enterprise.
Especially when enterprise is raised funds, if data update not in time, will lead to the analysis of system Not comprehensively, analysis is easy to appear since enterprise is from the limitation of development and capital, leading to analysis is more than enterprise's ability to bear, thus Cause the not accurate of planning.It can be seen that the system and method for not forming a kind of fund raising institutional analysis of high-level at present, so that The inaccuracy and low efficiency of data analysis, so that raising funds rationally meet the corresponding feedback machine of different scales enterprise fund raising System, cause the operation of enterprise bad, bring biggish funding risk to enterprise, reduce enterprise raise funds and income is matched can It can property.
Summary of the invention
The main purpose of the present invention is to provide a kind of data analysing method, device, equipment and computer-readable storage mediums Matter, it is intended to solve to cause system to ask the technology of enterprise's fund raising planning application inaccuracy not in time due to the update of existing data Topic.
To achieve the above object, the present invention provides a kind of data analysing method, and the data analysing method includes:
Receive the data analysis request that terminal is sent, and object to be analyzed in analysis request based on the data, acquisition pair The object data set answered, the object data set include at least representation data and external representation data inside object;
By representation data inside object and external representation data according to preset fund raising grade classification grade, at least one is obtained Data subset, the data subset and the object to be analyzed correspond;
According to the data subset, the corresponding maximum fund raising range of object to be analyzed and the maximum of its assets are calculated Ability to bear grade;
According to the maximum fund raising range and the maximum bearing ability grade of its assets, corresponding model training is selected to calculate Method;
The training for carrying out fund raising prediction to the data subset according to the model training algorithm obtains prediction mould of raising funds Type, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, export fund raising prediction result.
Optionally, the object to be analyzed in the analysis request based on the data, obtains corresponding object data set After step, further includes:
The data format of used data set, the data format packet when obtaining for training the fund raising prediction model Include the storage position of label column, the collating sequence of label column and data;
According to the data format to label column in the internal representation data and external representation data according to the sequence Sequence is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;
If there is the label column of missing in the external representation data and internal representation data, in the internal number of drawing a portrait According to increase the label column of missing on corresponding position in external representation data, and the data that fill in the blanks are standardized to be formed Object data set;
If there are the label columns of redundancy in the external representation data and internal representation data, by the internal number of drawing a portrait According to redundancy in external representation data label column and its corresponding data delete or shield from data set and be set as invalid, To form standardized object data set.
Optionally, described by representation data inside object and external representation data according to preset fund raising grade classification etc. Grade, after obtaining at least one data subset, further includes:
It is given a mark, is beaten at least one described data subset by the weight ratio coefficient in preset scoring model Divide result;
According to the marking as a result, being ranked up to the data subset according to sequence from big to small, and select to give a mark Valid data collection of the forward N number of data subset as fund raising prediction model training, wherein N >=1.
Optionally, described by representation data inside object and external representation data according to preset fund raising grade classification etc. Grade, after obtaining at least one data subset, further includes:
The data subset carries out signature analysis, obtains the identical data characteristics of each data in the data subset;
Feature derivation is carried out to the data characteristics, obtains data similar with the data in the data subset, wherein The feature derivation is that further subdivision either extension similar features are done to the data characteristics.
Optionally, the training for carrying out fund raising prediction to the data subset according to the model training algorithm, obtains Fund raising prediction model, and the ecological engineering to object is predicted based on the fund raising prediction model, output raises funds to predict knot Fruit includes:
When being trained using the training algorithm of Light GBM model, according to the grade classification result of the data subset Light GBM model training framework corresponding with the grade of the data subset is matched, and the data subset is input to institute It states in model architecture and is trained, obtain the fund raising prediction model, wherein the fund raising prediction model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yiAfter carrying out standardization processing for representation data Label column label value,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data, ftBeing characterized value is xtWhen approximate target value, i be the label column item number, xt=x is yiCorresponding characteristic value, t are described The item number of characteristic value, what constant, L and Ω were indicated is constant term;
The inside representation data and external representation data of object to be predicted are obtained, and is input to the fund raising prediction model In, output fund raising grade corresponding with the object to be predicted;
According to the fund raising grade, corresponding with fund raising assessment report from preset fund raising grade is that table inquiry is corresponding to it Fund raising assessment report.
Optionally, described according to the fund raising grade, from preset fund raising grade system corresponding with fund raising assessment report Table is inquired after corresponding fund raising assessment report, further includes:
The minimum value of the fund raising prediction model is calculated, and reported based on the fund raising that minimum value judgement generates Feasibility, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
Data acquisition module, for receiving the data analysis request of terminal transmission, and based on the data in analysis request Object to be analyzed, obtains corresponding object data set, and the object data set includes at least representation data and outside inside object Representation data;
Data staging module, for drawing representation data inside object and external representation data according to preset fund raising grade Graduation, obtains at least one data subset, and the data subset and the object to be analyzed correspond;
Computing module, for according to the data subset, calculate the corresponding maximum fund raising range of the object to be analyzed with And its maximum bearing ability grade of assets;
Prediction module, for the maximum bearing ability grade according to the maximum fund raising range and its assets, selection pair The model training algorithm answered;The instruction of fund raising prediction is carried out at least one described data subset according to the model training algorithm Practice, obtains fund raising prediction model, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, output is raised Provide prediction result.
Optionally, the data analysis set-up further includes format converting module, is obtained for training the fund raising prediction mould The data format of data set used in type, the data format include depositing for label column, the collating sequence of label column and data Put position;It is suitable according to the sequence to label column in the internal representation data and external representation data according to the data format Sequence is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;If the external representation data and internal picture As there is the label column of missing in data, then increase on corresponding position in the internal representation data and external representation data The label column of missing, and the data that fill in the blanks, to form standardized object data set;If the external representation data and inside There are the label columns of redundancy in representation data, then will in the internal representation data and external representation data the label column of redundancy and Its corresponding data, which is deleted or shielded from data set, is set as invalid, to form standardized object data set.
Optionally, the data analysis set-up further includes scoring modules, for passing through the weight in preset scoring model It than coefficient, gives a mark to the data subset, obtains marking result;According to the marking as a result, being pressed to the data subset It is ranked up according to sequence from big to small, and the forward N number of data subset that selects to give a mark is as fund raising prediction model training Valid data collection, wherein N >=1.
Optionally, the data analysis set-up further includes derivation module, for carrying out signature analysis to the data subset, Obtain the identical data characteristics of each data in the data subset;Feature derivation is carried out to the data characteristics, is obtained and institute State the similar data of data in data subset, wherein the feature derivation is segmented further to the data characteristics Either extension similar features.
Optionally, the prediction module includes model training unit and report generation unit;
The model training unit, for when being trained using the training algorithm of Light GBM model, according to described The grade classification result of data subset matches LightGBM model training framework corresponding with the grade of the data subset, and will The data subset is input in the model architecture and is trained, and obtains the fund raising prediction model, wherein described to raise funds in advance Survey model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yi are after representation data carries out standardization processing Label column label value,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data, ftBeing characterized value is xtWhen approximate target value, i be the label column item number, xt=x is yiCorresponding characteristic value, t are described The item number of characteristic value, what constant, L and Ω were indicated is constant term;
The report generation unit, for obtaining the inside representation data and external representation data of object to be predicted, and it is defeated Enter into the fund raising prediction model, output fund raising grade corresponding with the object to be predicted;According to the fund raising grade, from Preset fund raising grade corresponding with fund raising assessment report is that table inquires corresponding fund raising assessment report.
Optionally, the data analysis set-up further includes judgment module, for calculating the minimum of the fund raising prediction model Value, and the feasibility of the fund raising report generated based on minimum value judgement, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
In addition, to achieve the above object, also a kind of data analysis equipment of the present invention, the data analysis equipment includes: to deposit Reservoir, processor and it is stored in the data analysis program that can be run on the memory and on the processor, the number The step of realizing data analysing method as described in any one of the above embodiments when being executed according to analysis program by the processor.
In addition, to achieve the above object, also a kind of computer readable storage medium of the present invention, the computer-readable storage Data analysis program is stored on medium, the data analysis program is realized as described in any one of the above embodiments when being executed by processor Data analyze the step of method of payment.
The present invention is carried out by obtaining inside representation data and the outside representation data of enterprise according to data analysis request The planning application of fund raising data forms preliminary analysis of raising funds as a result, then polarity enterprise ecology is balanced based on the analysis results pushes away It drills, generates the corresponding fund raising plan of enterprise, raised funds and the funding mechanism of Revenue Reconciliation, be based on forming enterprise during fund raising The funding mechanism carries out preparing for fund, raises funds to raise funds to mismatch with income caused by planning so as to avoid the blindness of enterprise The case where, while planning is raised to be formed based on enterprises and external data, the system greatly improved is in planning application Precision, ensure that enterprise raise funds maximum benefit, also improve the enthusiasm that enterprise raises funds to poverty alleviation.
Detailed description of the invention
Fig. 1 is the flow diagram of data analysing method first embodiment provided by the invention;
Fig. 2 is the flow diagram of data analysing method second embodiment provided by the invention;
Fig. 3 is the functional block diagram of one embodiment of data analysis set-up provided by the invention;
Fig. 4 is the structural schematic diagram for the server running environment that the embodiment of the present invention is related to.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
In the present invention, the data analysing method provided generally refers to for realizing the poverty alleviation fund raising income to enterprise The prediction technique of a kind of fund raising planning of balanced fund raising data schema, it is of course possible to for realizing the planning point of other business Analysis, this method, which specifically can be through current fund raising poverty alleviation system, to be realized, it is preferred that is in existing fund raising poverty alleviation system Increase in system and realize that the software code data of this method can be realized, the physics realization of the system can be personal computer (PC), server, smart phone etc..Based on such hardware result, each embodiment of data analysing method of the present invention is proposed.
Referring to Fig.1, Fig. 1 is the flow chart of data analysing method provided in an embodiment of the present invention.In the present embodiment, described Data analysing method specifically includes the following steps:
Step S110 receives the data analysis request that terminal is sent, and to be analyzed right in analysis request based on the data As obtaining corresponding object data set;
In this step, the object data set includes at least representation data and external representation data inside object;This is right Image data collection can specifically be obtained from existing business standing system, can also be obtained from the comment website of internet, Be mainly used for the judgement to the resource and ability to bear of enterprise, the representation data include enterprise's ranking, business impact index, Needed for scope of the enterprise, enterprise's annual income, the type of business, enterprise sort, enterprise's history year fund raising dynamics, fund raising classification, enterprise Place etc. is held in propaganda strength, publicity market and some marketing or exhibition.
And enterprise's ranking includes operation situation ranking, the ranking of paying taxes, total assets ranking, credit worthiness for enterprise location Ranking, or even the ranking that can also be profit etc. can also be the ranking for obtaining the enterprise on the whole nation according to practical application.
Step S120, by representation data inside object and external representation data according to preset fund raising grade classification grade, At least one data subset is obtained, the data subset and the object to be analyzed correspond;
Step S130 calculates the corresponding maximum fund raising range of object to be analyzed and its assets according to the data subset Maximum bearing ability grade;
In the present embodiment, the object refers to enterprise, and in other words the execution of step S120 and S130 is to enterprise Industry data set carries out the planning application of fund raising data, obtains the preliminary fund raising analysis of enterprise as a result, the preliminary fund raising analysis knot Fruit includes the maximum fund raising range of enterprise and the maximum bearing ability grade of its assets;In the planning application of the fund raising of the step In, the assets of mainly analysis enterprise are withstood forces, and assets endurance is relatively to embody the state of development of enterprise Data are also convenient for preparing to the future development program of enterprise, and fund raising is a kind of mode of enterprise development, and enterprise both may be implemented The development of itself also may be implemented to help external support.
The calculating of assets endurance needs physical assets and intangible asset in conjunction with enterprise to calculate, intangible asset due to Manage it is proper obtained from the property given of the external world, it may be said that be a kind of business standing degree, this is to guarantee enterprise in practical fund raising When a kind of trust resource.
Maximum fund raising range and ability to bear grade are still needed to combine field involved in the enterprise itself, example Such as the direction of enterprise's main development either produce product, need according to the type of business and its service industry come into Row calculates, and is not that any one enterprise can optionally be raised funds in any one industry or field.
For example, judging the maximum fund raising energy that enterprise currently can bear according to the current net income of enterprise and debt situation Power, capability-based grade raise amount to determine, on the basis of determining fund raising amount, then determine the sustainability of enterprises Grade, the ability to bear grade can be in conjunction with factors such as enterprise current business trend, practical revenue and expenditure and the state of developments of enterprise Comprehensively consider and is calculated.
Step S140, according to the corresponding mould of maximum bearing ability hierarchical selection of the maximum fund raising range and its assets Type training algorithm;
In practical applications, for the selection of model training algorithm, specifically can to reply relation table by way of come It is selected, i other words, user is previously according to practical the case where raising funds, and assets and income previously according to company etc. are because usually The fund raising range of company is estimated, and carries out the division of grade to fund raising range, which includes basic, normal, high a variety of grades, so After select corresponding model training algorithm, finally create a mapping table, in actual use, by with maximum fund raising model The condition with the maximum bearing ability grade of assets as retrieval is enclosed, corresponding model training algorithm is selected from mapping table Carry out using.
Certainly, in this step, except selecting, it can also be going through according to company except through the mode of corresponding relationship History raises funds record to determine, such as raises according to the maximum bearing ability hierarchical search of maximum fund raising range and assets is in-company Historical record is provided, the much the same record of grade therewith is selected, and extracts the model training algorithm in record, so that implementation model is instructed Practice the selection of algorithm.
Step S150 carries out the instruction of fund raising prediction according to the model training algorithm at least one described data subset Practice, obtains fund raising prediction model, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, output is raised Provide prediction result.
In the present embodiment, object to be predicted here refers to that user needs to carry out enterprise's name of fund raising planning forecast Claim;And object to be analyzed refers to the enterprise name for carrying out model training, can be multiple, is mainly used for obtaining trained mould The data of type;The prediction that fund raising is realized by model is substantially the mistake deduced to the entire break-even ecology of enterprise Journey, the deduction for implementing enterprise ecology equilibrium refer to the deduction of the degree of balance between the fund raising of enterprise and income, are according to just Step fund raising analysis result simulation is deduced enterprise and is raised funds based on current analysis result, if be can satisfy and is being guaranteed Business survival Under the premise of maximum fund raising saturation degree, concrete implementation mode can be by according to maximum fund raising range and assets most Big ability to bear grade deduces the balance level for calculating fund raising and income, determines corresponding fund raising according to the balance level Amount, the plan of raising even pre-planned or start to raise funds the preparation directly raised according to fund raising scale.
In the present embodiment, for step S120 particular by enterprise's ranking in the representation data according to enterprise, enterprise Industry Intrusion Index, scope of the enterprise, enterprise's annual income, the type of business, enterprise sort, enterprise's history year fund raising dynamics, fund raising class Not, the data such as propaganda strength needed for enterprise, publicity market can carry out preliminary grading to enterprise, for example be referred to according to business impact Number, enterprise's annual income, enterprise's history year fund raising dynamics and these data of the publicity situation of enterprise carry out operation situation to enterprise Preliminary analysis, if operation situation is good, carry out deeper into calculating analyze, be the representation data in conjunction with more enterprises The damage analysis of carry out machine, obtains final fund raising dynamics and based under the premise of the fund raising dynamics, the maximum of the enterprise assets is withstood forces Spend grade.
Corresponding point of planning of raising funds is determined to the assessment of an enterprise by above-mentioned mode, allows enterprise better The operation raised funds, to improve the enthusiasm that enterprise raises funds for poverty alleviation;Also ensure that the abundant of poverty alleviation fund raising is implemented With use.
In the present embodiment, in step s 110, object to be analyzed in prosperous request is analyzed based on the data described, obtain After taking corresponding object data set, further includes:
The representation data of the object dataset is pre-processed, it is described pretreatment for by the representation data according to The data format required in data analysis system formats, the object data set standardized.
In practical applications, for being substantially by the picture of the enterprise of these object datasets by data format specificationsization As data are converted to the data of fixed format, in order to be convenient for subsequent calculating, simplify in this way to data Processing, can be avoided mixed and disorderlyization due to data and influence to calculate as a result, also being mentioned to improve the benchmark degree of calculating The high finally prediction to the fund raising scale of business data.
In the present embodiment, the representation data to the object dataset, which pre-process, includes:
The data format for training data set used in the fund raising prediction model is obtained, the data format includes The storage position of label column, the collating sequence of label column and data;
According to the data format to label column in the internal representation data and external representation data according to the sequence Sequence is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;
If there is the label column of missing in the external representation data and internal representation data, in the internal number of drawing a portrait According to increase the label column of missing on corresponding position in external representation data, and the data that fill in the blanks are standardized to be formed Object data set;
If there are the label columns of redundancy in the external representation data and internal representation data, by the internal number of drawing a portrait According to redundancy in external representation data label column and its corresponding data delete or shield from data set and be set as invalid, To form standardized object data set.
It is to be arranged by shielding label, and arrange missing the Missing Data Filling method and case line for carrying out tree-model prediction Figure abnormal test etc., standardizes to initial data.It does not add, but data cleansing is carried out to initial data, by one The method of a little data cleansings.It is label column that shielding label column, which are because of label column, other than carrying out model training and verifying, We are arranged heart as small as possible using label, because this column data is extremely important.Therefore Missing Data Filling is being carried out with tree-model When, we prefer to this column to remove, the influence of this columns value is reduced, and only consider other representation datas.Therefore It is not to be arranged using label, but data combing and Feature Engineering will shield label column in many cases.
In practical applications, to the general format of representation data collection got be all can be compare specification pass through data The data of table storage, and can just have and be provided with when either some statistics companies export for enterprise for data form Various label column labels, and it is unwanted in the prediction of the fund raising planning of these label column in this application, in this regard, here Pretreatment, specifically can be by will shield obtain representation data in label column (label column), or pass through inspection picture As the mode of missing or redundancy in data to carry out form modifying to data.
Such as: if the data checked have the case where missing, selection is by way of increasing label column to portrait number According to the increase for carrying out empty parameter, so that representation data meets preset call format, preferably can choose by scarce The modes such as Missing Data Filling method and the box traction substation abnormal test of column progress tree-model prediction are lost to standardize representation data Change processing;If to check data be redundancy, data delete by way of shielding label column and reject redundancy letter Breath.
Same format analysis processing is carried out to initial data by above-mentioned mode, the standardization of data is realized, can keep away Exempt from the format diversification due to data, subsequent grade assessment is caused deviation occur.
In the present embodiment, the object data set that the basis is got carries out the planning application of fund raising data, obtains To enterprise preliminary fund raising analysis result include:
It will carry out the pretreated representation data and carry out data progression process according to preset grade, much a data Subset, the data subset and the enterprise correspond;
The maximum of corresponding maximum the fund raising range and assets of enterprise in the representation data is calculated according to the data subset Ability to bear grade.
The representation data is substantially also a data set, including external representation data and internal representation data, tool Body can divide bucket to realize and carry out data staging to representation data collection, in practical applications, by advance for every by feature Corresponding data characteristics is arranged in a grade, and in classification, pass through the data characteristics pre-set and the representation data collection In data characteristics be compared can be realized the classification to the representation data collection processing.
In practical applications, the corresponding data characteristics of each grade is corresponding from different fund raising grades, such as Fund raising grade classification to enterprise is 10 grades, the representation data when being classified processing, after being converted into prescribed form first In keyword be compared respectively with the data characteristics of 10 grades, number of degrees is then determined according to the result of comparison, it is false If the representation data is the business data for enumerating the uniform type of the enterprise of a period of time length, first when being classified processing Representation data is segmented according to small time interval first, is obtained to a multiple small sets, then by each small set respectively at The corresponding data characteristics of 10 grades is compared, when determining that data characteristics reaches the corresponding numerical value of the grade, then by the small set It is divided into the grade, after the completion of comparing, the small set is formed at least one data subset.
Below with data such as continuous annual incomes that the representation data collection got is enterprise, we can carry out data point Grade, continuous annual income is for example divided, obtain multiple small sets according to the time, and it is exactly annual income that each grade is corresponding The average income amount of money, the average value amount of money corresponding with grade in small set is compared, comparison result is in 1000W or more Corresponding factor variable is 10 (they being 10 grades), and 500W or more is 8, and so on.
Certainly there is some data not necessarily amount of money only to facilitate understanding the example for the amount of money enumerated above, There are also it is some be text type data, be also required to carry out text matches are carried out in a manner mentioned above after to be classified or from Dispersion, only text may be compared for specific crucial system.It certainly, is according to reality for above-mentioned grade classification The conditions of the enterprise of fund raising divided, such as the current assessment of participating in raising funds enterprise overall strength it is relatively high when, Its grade can divide few point, and the requirement of each grade is just relatively high;When participate in raise funds assessment enterprise overall strength all When relatively low, grade classification is with regard to multiple spot, and the requirement of inferior grade is also relatively low, can mainly mention for some small enterprises For facilitating the generalization of poverty alleviation.
Certainly, the above-mentioned some hierarchy models of classification are realized with classification is handled, and the training of analysis model is to pass through The data standardized are trained, and are in other words directly to be instructed using the data label carrying out model training Practice, is also possible to be realized according to the data characteristics in classification.
In the present embodiment, described by representation data inside object and external representation data according to preset fund raising grade Divided rank, after obtaining at least one data subset, further includes:
Subset carries out signature analysis based on the data, and it is special to extract the identical data of each data in the data subset Sign;
The derivation of data characteristics is carried out according to the data characteristics, it is similar to the data in the data subset to expand Data, wherein the derivation refers to doing the data characteristics further subdivision either extension similar features, thus So that more accurate to the division of representation data collection.
In practical applications, it does not need to carry out derivation to all data characteristicses, can be only to a part therein It carries out, derivation specifically can be preferably special to preset ranked data according to the determination of the actual classification of the data to enterprise Sign carries out derivation, it is assumed that carries out derivative differentiation to a certain column feature, feature is for example changed to one-hot type etc..It is not every A data will be done, but be screened in an orderly manner according to the characteristic of data itself, these are all the processes of experiment.We can take The characteristic processing method of a variety of different representation datas is tested, and is based ultimately upon result to select optimal Feature Engineering to calculate Method.
Further, in the step of carrying out feature derivation, concrete implementation process are as follows: first in the comparison being classified Processing by also judging whether that corresponding derivation can be carried out to each data characteristics, derivation here or base certainly Derivation is carried out in the data characteristics of ad eundem, if derivation can be carried out, this is according to the type and scene of the data characteristics in grade It is verified, the reasonability in conjunction with representation data itself is also needed to carry out derivation certainly during derivation, it can not be beyond enterprise The practical development of the representation data of industry carries out derivation, if excessive derivation will lead to the deviation of subsequent training pattern, so as to enterprise Inaccuracy is assessed in the fund raising of industry.For example, the current type of data characteristics is A, there is B type with type similar in type-A, then combine To representation data development judge whether can with derivation to similar B type, if can if carry out derivation, and obtain the B Feature of other features as this aspect ratio pair under type, to extend the quantity of feature, while also meeting enterprise It deduces and requires, the accuracy of the assessment greatly improved.
In the present embodiment, for particular-trade derivation, it includes that feature differentiation and signature search add two kinds, wherein for spy Sign differentiation, specific implementation, which can be, selects corresponding differentiation method according to the type of data characteristics or the classification of data subset, Each of splitting, but split out small feature to each data characteristics based on the differentiation method is and original data Feature belongs to same type or has the same meaning the phrase of the meaning.
Signature search is added, specific implementation can be by according to semanteme of the data characteristics in data subset come group Then word selects the data belonged in data subset to unify class to obtain more similar characteristics from these similar special cards Another characteristic.
It in the present embodiment, can also be by according to pair that get in generating the step of thigh recommends fund raising plan Image data collection carries out the mode of model training to carry out the generation of fund raising plan, obtains raising for a deduction particular by training Model is provided, corresponding business data is inputted based on this model and is predicted i.e. exportable corresponding fund raising plan.
In practical applications, model training is carried out according to the representation data after data progression process, to obtain prediction of raising funds Model.
In the present case, the training of the model is passed through particular by the known enterprise's representation data got Enterprise itself or expert beat label to exporting corresponding data form.For example A enterprise, portrait dimension from f1-fn, Label is 1, there is enterprise's fund raising, propaganda strength, the mapping for publicizing market;Equally, B, f1-fn, label are 2, with such It pushes away, we collect to obtain the data set that a part has known label, and it is for training pattern, from these that we, which build model, Label data concentration removes learning law, such as simplest linear model, and y=a1f1+a2f2+a3f3 ..., we pass through number of tags According to training pattern is gone, the numerical value of a1, a2, a3 are obtained, wherein f (n) is representation data, and obtained a (n) is for pattern function Number.
In the present case, other than being trained deduction using above-mentioned linear model, preferred selection is used LightGBM model is trained, and the training of this kind of model uses segmentation and selection of the gradient to characteristic, can reduce data The calculating of amount greatly improves the efficiency of training pattern, and the training for the model is specific as follows, first by object data set into Row is divided into multiple data subsets, and carries out input training based on data subset.
Input: training data, iterative steps d, the sample rate a of big gradient data, the sample rate b of small gradient data, loss The type (generally decision tree) of function and several learners;
Output: trained strong learner;
(1) descending sort is carried out to them according to the absolute value of the gradient of sample point;
(2) sample of a*100% generates the subset of one big gradient sample point before choosing to the result after sequence;
(3) to the sample of remaining sample set (1-a) * 100%, random * 100% sample point of selection b* (1-a), Generate the set of one small gradient sample point;
(4) big gradient sample and the small gradient sample of sampling are merged;
(5) small gradient sample is multiplied by weight coefficient (1-a)/b;
(6) using the sample of above-mentioned sampling, learn a new weak learner;
(7) (1)~(6) step is repeated continuously until reaching defined the number of iterations or convergence.
While learner precision can be lost under the premise of not changing data distribution by algorithm above greatly Reduce the rate of model learning.
In the present embodiment, fund raising prediction is carried out at least one described data subset according to the model training algorithm Training obtains fund raising prediction model, and predicts that output is raised to the ecological engineering to object based on the fund raising prediction model Providing prediction result includes:
When being trained using the training algorithm of Light GBM model, according to the grade classification knot of the data subset Fruit matches Light GBM model training framework corresponding with the grade of the data subset, and the data subset is input to It is trained in the model architecture, obtains the fund raising prediction model, wherein the fund raising prediction model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yiAfter carrying out standardization processing for representation data Label column label value,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data, ftBeing characterized value is xtWhen approximate target value, i be the label column item number, xt=x yiCorresponding characteristic value, t are described The item number of characteristic value, what constant, L and Ω were indicated is constant term;
The inside representation data and external representation data of object to be predicted are obtained, and is input to the fund raising prediction model In, output fund raising grade corresponding with the object to be predicted;
According to the fund raising grade, corresponding with fund raising assessment report from preset fund raising grade is that table inquiry is corresponding to it Fund raising assessment report.
Described according to the fund raising grade, from preset fund raising grade it is corresponding with fund raising assessment report be table inquiry with Corresponding fund raising assessment report after, further includes:
The minimum value of the fund raising prediction model is calculated, and reported based on the fund raising that minimum value judgement generates Feasibility, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
Specifically, for obtaining fund raising prediction model to the object data set training got using Light GBM model Realization process it is specific as follows:
Assuming that object data set has n instance X 1 ..., Xn { X1 ..., Xn } X1 ..., Xn, characteristic dimension s.Every subgradient When repeatedly, the negative gradient direction of the loss function of model data variable is expressed as g1 ..., gn, and decision tree passes through optimal cut-off (most Big information gain point) data are assigned into each node, then the data by these after dividing pass through the default of LightGBM model It is trained in model architecture, to obtain final fund raising planning forecast model, the function formula of the model is as follows:
Wherein, yiThe label value of label column after carrying out standardization processing for representation data data;ft(Xt) it is to acquisition The approximate calculation function of characteristic value in representation data;XtFor yiCorresponding characteristic value;J is the segmentation feature of representation data;S is Cut-point;What constant was indicated is constant term.
Further, Taylor expansion is carried out by above-mentioned fund raising planning forecast model to calculate to obtain based on enterprise's fund raising The approximate target value drawn, calculation formula are as follows:
Then, the functional minimum value for acquiring above-mentioned model, judge the fund raising plan of enterprise based on the minimum value can Row, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data Value;ciFor with yiCorresponding absolute value.
LightGBM finds division gain maximum (one using leaf-wise growth strategy from current all leaves every time As be also that data volume is maximum) a leaf, then divide, so recycle;But deep decision tree can be grown, was generated Fitting (therefore LightGBM increases the limitation of a depth capacity on leaf-wise, it is efficient while anti-in guarantee Only over-fitting).LightGBM optimizes the support to category feature, can directly input category feature, does not need additional 0/1 Expansion.And the decision rule of category feature is increased in decision Tree algorithms.Dispersion specification is used in data parallel (Reducescatter) task that histogram merges is shared different machines, reduce communication and calculated, and utilize histogram It makes the difference, further reduces the traffic of half.Data parallel (ParallelVoting) based on ballot then advanced optimizes Communication cost in data parallel makes communication cost become constant rank.
In summary, LightGBM model has good robustness, can prevent over-fitting well, and adds again in performance Speed optimization, so that arithmetic speed is faster, memory consumption is lower, this is also that we select the important original of this model of LightGBM Cause.
According to maximum fund raising range and the maximum bearing ability grade of its assets, generated in conjunction with fund raising plan forecast model Corresponding fund raising scale.
In the present embodiment, which can be instructed in advance by largely raising funds layout data It gets, this certain model is also possible to establish by demand theory, then constantly trains in actual prediction application It obtains, to improve the precision of model.
Further, described by representation data inside object and external representation data according to preset fund raising grade classification Grade, after obtaining at least one data subset, further includes:
It is given a mark by the weight ratio coefficient in scoring model at least one described data subset;
Select the higher data subset of marking as the fund raising from least one described data subset according to marking result The valid data collection of prediction model training.
In practical applications, in training prediction model, marking screening is carried out to data by introducing weight ratio, certainly It also can be set in data prediction the step of and realize to weight ratio, it in this way can also be in advance to the screening of data, specifically The specific targets such as known business fund raising, propaganda strength, publicity market can be based on by realizing in conjunction with scoring model, Wherein, it is believed that enterprise raises funds and propaganda strength is most important marking index, it will be assumed that weight is respectively set as 0.3, therefore The two indexs reach weight 0.6, remaining index mean allocation weight and weight are cumulative and are 0.4.Secondly, we are based on respectively After the specific value of index is normalized, then it is weighted.Final each enterprise we can quantify to obtain a l Specific label numerical value.More than, the combing of label column finishes the pretreatment of data (be finish).
It in the present embodiment, can be with except the prediction except through the mode of above-mentioned model to realize fund raising scale Selection is simply simply sorted out with some analysis mechanisms, is that comparison is not applicable in the biggish situation of data volume, And there is also certain loopholes for the preciseness of logic.But it if there is resource, the limitation of time, can be drawn a portrait with extraction section Data carry out simple classification, and by carrying out some correlation tests to label data, the higher representation data of correlation, which is used as, to be drawn Point foundation, in this way and a kind of simple and practical scheme.
As shown in Fig. 2, being trained for the embodiment of the present invention based on LightGBM model, the data analysis side for prediction of raising funds The specific implementation flow chart of method, this method specifically includes the following steps:
Step S210, by the communication connection of internet, obtained from data system relevant to enterprise and website into The representation data of the enterprise of row fund raising planning forecast;
In this step, the representation data of acquisition is specifically enterprise's ranking, business impact index, scope of the enterprise, enterprise year Propaganda strength needed for income, the type of business, enterprise sort, enterprise's history year fund raising dynamics, fund raising classification, enterprise, publicity city Field, host city etc..
Step S220 carries out the processing of data format and feature derivation to the representation data;
In this step, the data that will acquire first first carry out tabular processing, are the difference according to data in table Generation new line label, which is referred in data form, in lattice is stored, but the data information that the data got are not all It is all enabled, may exist it is some do not need or the information of redundancy, at this moment again by shielding label column, and to missing arrange into The Missing Data Filling method of row tree-model prediction and box traction substation abnormal test etc., standardize to representation data.
Further, a point bucket is carried out to the characteristic in all kinds of representation datas, such as to enterprise's annual income consecutive numbers According to progress stepping counting, and correspond to specific value.Year-on-year to the progress such as enterprise's net income, debt, ring ratio feature is derivative.
In this step, the number that the data of not type label can also be arranged in such a way that label arranges construction According in table, specifically the enterprises such as different enterprise's fund raisings, propaganda strength, publicity market, host city are drawn a portrait, pass through scoring model It combs out and quantifies label.
In the present embodiment, can also be classified according to the property difference of data itself to carry out feature.If it is even Continue data, for example the data such as enterprise's annual income, we can carry out data staging, and for example the corresponding factor of 1000W or more becomes Amount is 10, and 500W or more is 8, and so on.There are the data of some text types also to carry out being classified after text matches Or discretization.
Step S230, the training based on Light GBM model to treated representation data carries out fund raising prediction model.
In practical applications, processing is split to the representation data first by way of cutting, obtained to number It according to subset, and determines the cut-point of each subset, is carried out based on cut-point combination subset using Light GBM model training, example The cutting strategy for such as using Light GBM, is illustrated, being exactly will be red, yellow, blue, green right by taking red, yellow, green, blue color set as an example The four class samples answered are divided into all possible strategies of two classes, such as: reddish yellow is a kind of, bluish-green one kind.Kind of a strategy is so just had, this Sample could adequately excavate the information that the dimensional feature is included, and find optimal segmentation strategy.But optimum segmentation is found in this way The time complexity of strategy will be very big.There is a effective solution scheme for regression tree.In order to find optimal division needs About.Basic idea is to be reordered according to the correlation of training objective to classification.More specifically, according to accumulated value () is again ranked up (category feature) histogram, and best cut-point is then found in sorted histogram.Base In the training framework that data set is input to Light GBM model by the cut-point, following model formation is obtained:
Taylor expansion is carried out based on above-mentioned fund raising planning forecast model to calculate to obtain the approximation of enterprise's fund raising plan Target value, calculation formula are as follows:
Then, the functional minimum value for acquiring above-mentioned model, judge the fund raising plan of enterprise based on the minimum value can Row, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively dimension section in representation data Value.
S240 obtains the current data of enterprise, is output in fund raising prediction model, and export prediction raises program results.
It further, further include introducing weight ratio to come to the number after segmentation when being split processing training pattern a little Marking screening is carried out according to subset, also can be set in data prediction the step of and realize to weight ratio certainly, in this way may be used Can specifically be raised funds based on known business, publicity power by being realized in conjunction with scoring model in advance to the screening of data The specific targets such as degree, publicity market, wherein it is considered that enterprise raises funds and propaganda strength is most important marking index, we Assuming that weight is respectively set as 0.3, therefore the two indexs reach weight 0.6, remaining index mean allocation weight and weight are cumulative Be 0.4.Secondly, after our specific values based on each index are normalized, then be weighted.Final each enterprise We can quantify to obtain a l specific label numerical value.More than, the combing of label column finishes, and is then carrying out model Training, which further increases the accuracy of the prediction result of model.
In order to solve the problem above-mentioned, the present invention also provides a kind of data analysis equipment, which can be used In realizing data analysing method provided in an embodiment of the present invention, physics realization exists in the manner of a server, the server Particular hardware is realized as shown in Figure 1.
Referring to Fig. 3, which includes: processor 301, such as CPU, communication bus 302, user interface 303, network Interface 304, memory 305.Wherein, communication bus 302 is for realizing the connection communication between these components.User interface 303 It may include display screen (Display), input unit such as keyboard (Keyboard), network interface 304 optionally may include Standard wireline interface and wireless interface (such as WI-FI interface).Memory 305 can be high speed RAM memory, be also possible to steady Fixed memory (non-volatile memory), such as magnetic disk storage.Memory 305 optionally can also be independently of The storage device of aforementioned processor 301.
It will be understood by those skilled in the art that the analysis of structure paired data does not fill the hardware configuration of equipment shown in Fig. 3 The restriction set may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in figure 3, as may include operating system, net in a kind of memory 305 of computer readable storage medium Network communication module, Subscriber Interface Module SIM and be based on data analysis program.Wherein, operating system is management and data analysis set-up With the program of software resource, the operation of branch data analysis program and other softwares and/or program.
In the hardware configuration of server shown in Fig. 3, network interface 104 is mainly used for accessing network;User interface 103 Be mainly used for either being communicated with the server for providing business data with extraneous internet, transfer for enterprise it is various Credit and assets information, and processor 301 can be used for calling the data analysis program stored in memory 305, and execute with The operation of each embodiment of lower data analysing method.
In this big bright embodiment, a kind of mobile phone etc. can be with the mobile end of touch control operation can also be for the realization of Fig. 3 End, the processor of the mobile terminal is stored in buffer or storage unit by reading may be implemented data analysing method Program code carry out deduction prediction come the fund raising plan for carrying out to enterprise.
In order to solve the problem above-mentioned, the embodiment of the invention also provides a kind of data analysis set-ups, are referring to Fig. 4, Fig. 4 The schematic diagram of the functional module of data analysis set-up provided in an embodiment of the present invention.In the present embodiment, which includes:
Data acquisition module 41, for receiving the data analysis request of terminal transmission, and analysis request based on the data In object to be analyzed, obtain corresponding object data set, the object data set includes at least inside object representation data and outer Portion's representation data;
Data staging module 42 is used for representation data inside object and external representation data according to preset fund raising grade Divided rank, obtains at least one data subset, and the data subset and the object to be analyzed correspond;
Computing module 43, for calculating the corresponding maximum fund raising range of the object to be analyzed according to the data subset With the maximum bearing ability grade of its assets;
Prediction module 44, for the maximum bearing ability grade according to the maximum fund raising range and its assets, selection Corresponding model training algorithm;The training for carrying out fund raising prediction to the data subset according to the model training algorithm, obtains Fund raising prediction model, and predict that output raises funds to predict knot based on ecological engineering of the fund raising prediction model to object to be predicted Fruit.
In the present embodiment, the data analysis set-up further includes format converting module, for the object data set In representation data pre-processed, it is described pretreatment for by the representation data according to the data required in data analysis system Format formats, the object data set standardized.
In the present embodiment, described device further includes judgment module, for calculating the minimum value of the fund raising prediction model, And the feasibility of the recommendation fund raising plan generated based on minimum value judgement.
Based on embodiment description identical with the data analysing method of the embodiments of the present invention, therefore the present embodiment The embodiment content of data analysis set-up is not done and is excessively repeated.
The present embodiment obtains inside representation data and the outside representation data of enterprise according to data analysis request to be raised The planning application of data is provided, forms preliminary analysis of raising funds as a result, the then deduction of polarity enterprise ecology equilibrium based on the analysis results, The corresponding fund raising plan of enterprise is generated, is raised funds and the funding mechanism of Revenue Reconciliation with forming enterprise during fund raising, being based on should Funding mechanism carries out preparing for fund, unmatched with income so as to avoid raising funds caused by the blindness fund raising planning of enterprise Situation, while raising planning based on enterprises and external data to be formed, the system greatly improved is in planning application Precision ensure that the maximum benefit that enterprise raises funds, also improve the enthusiasm that enterprise raises funds to poverty alleviation.
The present invention also provides a kind of computer readable storage mediums.
In the present embodiment, data analysis program is stored on the computer readable storage medium, the H5 webpage is swept Code payment program realizes the step of data analysing method as described in the examples such as any of the above-described when being executed by processor.Its In, the method realized when data analysis program is executed by processor can refer to each implementation of data analysing method of the present invention Example, therefore no longer excessively repeat.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly Other related technical areas are used in, all of these belong to the protection of the present invention.

Claims (10)

1. a kind of data analysing method, which is characterized in that the data analysing method includes:
The data analysis request that terminal is sent, and object to be analyzed in analysis request based on the data are received, is obtained corresponding Object data set, the object data set include at least representation data and external representation data inside object;
By representation data inside object and external representation data according to preset fund raising grade classification grade, at least one number is obtained According to subset, the data subset and the object to be analyzed are corresponded;
According to the data subset, calculates the corresponding maximum fund raising range of object to be analyzed and the maximum of its assets is born Ability rating;
According to the maximum fund raising range and the maximum bearing ability grade of its assets, corresponding model training algorithm is selected;
The training for carrying out fund raising prediction to the data subset according to the model training algorithm obtains fund raising prediction model, and Ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, exports fund raising prediction result.
2. data analysing method as described in claim 1, which is characterized in that in the analysis request based on the data to After the step of analyzing object, obtaining corresponding object data set, further includes:
The data format for training data set used in the fund raising prediction model is obtained, the data format includes label The storage position of column, the collating sequence of label column and data;
According to the data format to label column in the internal representation data and external representation data according to the collating sequence It is adjusted, and detects the wherein label column with the presence or absence of missing or redundancy;
If there is the label column of missing in the external representation data and internal representation data, in the internal representation data and Increase the label column of missing, and the data that fill in the blanks in external representation data on corresponding position, to form standardized object Data set;
If there are the label column of redundancy in the external representation data and internal representation data, by the internal representation data and The label column of redundancy and its corresponding data, which are deleted or shielded from data set, in external representation data is set as invalid, with shape At standardized object data set.
3. data analysing method as claimed in claim 2, which is characterized in that described by representation data inside object and outside Representation data is according to preset fund raising grade classification grade, after obtaining at least one data subset, further includes:
It by the weight ratio coefficient in preset scoring model, gives a mark to the data subset, obtains marking result;
According to the marking as a result, being ranked up to the data subset according to sequence from big to small, and it is forward to select to give a mark Valid data collection of N number of data subset as fund raising prediction model training, wherein N >=1.
4. data analysing method as described in any one of claims 1-3, which is characterized in that described by number of drawing a portrait inside object According to after according to preset fund raising grade classification grade, obtaining at least one data subset with external representation data, further includes:
Signature analysis is carried out to the data subset, obtains the identical data characteristics of each data in the data subset;
Feature derivation is carried out to the data characteristics, obtains data similar with the data in the data subset, wherein described Feature derivation is that further subdivision either extension similar features are done to the data characteristics.
5. data analysing method as claimed in claim 4, which is characterized in that it is described according to the model training algorithm to described Data subset carries out the training of fund raising prediction, obtains fund raising prediction model, and is based on the fund raising prediction model to described to right The ecological engineering of elephant predicts that output fund raising prediction result includes:
When being trained using the training algorithm of Light GBM model, according to the grade classification result of the data subset With Light GBM model training framework corresponding with the grade of the data subset, and the data subset is input to described It is trained in model architecture, obtains the fund raising prediction model, wherein the fund raising prediction model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yiMark after carrying out standardization processing for representation data The label value of column is signed,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data, ftFor Characteristic value is xtWhen approximate target value, i be the label column item number, xT=X is yiCorresponding characteristic value, t are the feature The item number of value, what constant, L and Ω were indicated is constant term;
The inside representation data and external representation data of object to be predicted are obtained, and is input in the fund raising prediction model, it is defeated Fund raising grade corresponding with the object to be predicted out;
According to the fund raising grade, corresponding with fund raising assessment report from preset fund raising grade is that table inquires corresponding raise Provide assessment report.
6. data analysing method as claimed in claim 5, which is characterized in that described according to the fund raising grade, from default Fund raising grade it is corresponding with fund raising assessment report be that table is inquired after corresponding fund raising assessment report, further includes:
The minimum value of the fund raising prediction model is calculated, and the fund raising report based on minimum value judgement generation is feasible Property, the formula minimized are as follows:
Wherein, R1(j, s)=and x | x(j)≤ s }, R2(j, s)=and x | x(j)> s } it is respectively that dimension section in representation data takes Value, i are the item number of the label column, and j is the segmentation feature of representation data, and s is cut-point, ciFor with yiCorresponding absolute value.
7. a kind of data analysis set-up, which is characterized in that the data analysis set-up includes:
Data acquisition module, for receiving the data analysis request of terminal transmission, and based on the data in analysis request wait divide Object is analysed, corresponding object data set is obtained, the object data set includes at least representation data and external portrait inside object Data;
Data staging module is used for representation data inside object and external representation data according to preset fund raising grade classification etc. Grade, obtains at least one data subset, and the data subset and the object to be analyzed correspond;
Computing module, for according to the data subset, calculate the corresponding maximum fund raising range of the object to be analyzed and its The maximum bearing ability grade of assets;
Prediction module selects corresponding for the maximum bearing ability grade according to the maximum fund raising range and its assets Model training algorithm;The training for carrying out fund raising prediction to the data subset according to the model training algorithm obtains raising funds pre- Model is surveyed, and ecological engineering prediction is carried out to object to be predicted based on the fund raising prediction model, exports fund raising prediction result.
8. data analysis set-up as claimed in claim 7, which is characterized in that the prediction module include model training unit and Report generation unit;
The model training unit, for when being trained using the training algorithm of Light GBM model, according to the data The grade classification result of subset matches Light GBM model training framework corresponding with the grade of the data subset, and by institute It states data subset and is input in the model architecture and be trained, obtain the fund raising prediction model, wherein the fund raising prediction Model are as follows:
Wherein, Obj is the output of the fund raising prediction model as a result, n > 1, yi are that representation data carries out the mark after standardization processing The label value of column is signed,For yiEstimated value, ft(xt)=f (x) is the approximate calculation function of characteristic value in representation data, ftFor Characteristic value is xtWhen approximate target value, i be the label column item number, xt=x is yiCorresponding characteristic value, t are the feature The item number of value, what constant, L and Ω were indicated is constant term;
The report generation unit for obtaining the inside representation data and external representation data of object to be predicted, and is input to In the fund raising prediction model, output fund raising grade corresponding with the object to be predicted;According to the fund raising grade, from default Fund raising grade it is corresponding with fund raising assessment report be that table inquires corresponding fund raising assessment report.
9. a kind of data analysis equipment, which is characterized in that the data analysis equipment includes: memory, processor and storage On the memory and the data analysis program that can run on the processor, the data analysis program is by the processing It realizes when device executes such as the step of data analysing method of any of claims 1-6.
10. a kind of computer readable storage medium, which is characterized in that be stored with data point on the computer readable storage medium Program is analysed, realizes that data of any of claims 1-6 such as are analyzed when the data analysis program is executed by processor The step of method.
CN201910479378.7A 2019-06-04 2019-06-04 Data analysis method, device, equipment and computer readable storage medium Active CN110310012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910479378.7A CN110310012B (en) 2019-06-04 2019-06-04 Data analysis method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910479378.7A CN110310012B (en) 2019-06-04 2019-06-04 Data analysis method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110310012A true CN110310012A (en) 2019-10-08
CN110310012B CN110310012B (en) 2023-07-28

Family

ID=68075289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910479378.7A Active CN110310012B (en) 2019-06-04 2019-06-04 Data analysis method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110310012B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956303A (en) * 2019-10-12 2020-04-03 未鲲(上海)科技服务有限公司 Information prediction method, device, terminal and readable storage medium
CN112598341A (en) * 2021-03-08 2021-04-02 工福(北京)科技发展有限公司 Data processing system and method for idle article poverty alleviation platform
CN114298427A (en) * 2021-12-30 2022-04-08 北京金堤科技有限公司 Enterprise attribute data prediction method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149821A (en) * 2007-05-15 2008-03-26 佟辛 Complete information based dynamically interactive type enterprise finance model construction and operation method
US20140136294A1 (en) * 2012-11-13 2014-05-15 Creat Llc Comprehensive quantitative and qualitative model for a real estate development project
CN105719069A (en) * 2016-01-15 2016-06-29 中国南方电网有限责任公司电网技术研究中心 Method and system for measuring enterprise cash flows
KR101750825B1 (en) * 2016-04-01 2017-06-26 주식회사 조이펀 A crowd funding platform system for securing investors by continuously providing objective investing information and the funding method by using the same
CN107767259A (en) * 2017-09-30 2018-03-06 平安科技(深圳)有限公司 Loan risk control method, electronic installation and readable storage medium storing program for executing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149821A (en) * 2007-05-15 2008-03-26 佟辛 Complete information based dynamically interactive type enterprise finance model construction and operation method
US20140136294A1 (en) * 2012-11-13 2014-05-15 Creat Llc Comprehensive quantitative and qualitative model for a real estate development project
CN105719069A (en) * 2016-01-15 2016-06-29 中国南方电网有限责任公司电网技术研究中心 Method and system for measuring enterprise cash flows
KR101750825B1 (en) * 2016-04-01 2017-06-26 주식회사 조이펀 A crowd funding platform system for securing investors by continuously providing objective investing information and the funding method by using the same
CN107767259A (en) * 2017-09-30 2018-03-06 平安科技(深圳)有限公司 Loan risk control method, electronic installation and readable storage medium storing program for executing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956303A (en) * 2019-10-12 2020-04-03 未鲲(上海)科技服务有限公司 Information prediction method, device, terminal and readable storage medium
CN112598341A (en) * 2021-03-08 2021-04-02 工福(北京)科技发展有限公司 Data processing system and method for idle article poverty alleviation platform
CN114298427A (en) * 2021-12-30 2022-04-08 北京金堤科技有限公司 Enterprise attribute data prediction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110310012B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
Feng et al. An expert recommendation algorithm based on Pearson correlation coefficient and FP-growth
Tsui et al. Knowledge-based extraction of intellectual capital-related information from unstructured data
Chatterjee et al. Supplier selection in Telecom supply chain management: a Fuzzy-Rasch based COPRAS-G method
CN103761254B (en) Method for matching and recommending service themes in various fields
CN106095942B (en) Strong variable extracting method and device
CN110288350A (en) User's Value Prediction Methods, device, equipment and storage medium
CN106484813A (en) A kind of big data analysis system and method
CN110222733A (en) The high-precision multistage neural-network classification method of one kind and system
CN114860916A (en) Knowledge retrieval method and device
CN113537807A (en) Enterprise intelligent wind control method and device
CN114861050A (en) Feature fusion recommendation method and system based on neural network
CN117271767A (en) Operation and maintenance knowledge base establishing method based on multiple intelligent agents
CN110310012A (en) Data analysing method, device, equipment and computer readable storage medium
Yiping et al. An improved multi-view collaborative fuzzy C-means clustering algorithm and its application in overseas oil and gas exploration
CN115145993A (en) Railway freight big data visualization display platform based on self-learning rule operation
Li et al. Drilling down artificial intelligence in entrepreneurial management: A bibliometric perspective
Minhas et al. An efficient algorithm for ranking candidates in e-recruitment system
Liu et al. Multi-task learning based high-value patent and standard-essential patent identification model
CN107424026A (en) Businessman's reputation evaluation method and device
Li Consumer behavior analysis model based on machine learning
CN112506930B (en) Data insight system based on machine learning technology
CN115293867A (en) Financial reimbursement user portrait optimization method, device, equipment and storage medium
Augenstein et al. Towards Value Proposition Mining" Exploration of Design Principles.
Alnoukari ASD-BI: An agile methodology for effective integration of data mining in business intelligence systems
CN115619238B (en) Method for establishing inter-enterprise cooperative relationship for non-specific B2B platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant