CN106940836A - A kind of data analysing method and device - Google Patents

A kind of data analysing method and device Download PDF

Info

Publication number
CN106940836A
CN106940836A CN201710108744.9A CN201710108744A CN106940836A CN 106940836 A CN106940836 A CN 106940836A CN 201710108744 A CN201710108744 A CN 201710108744A CN 106940836 A CN106940836 A CN 106940836A
Authority
CN
China
Prior art keywords
data
analysis
default
polymerizing factor
data target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710108744.9A
Other languages
Chinese (zh)
Inventor
李悦
滕放
曹培坤
马超
赵继广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Causality Network Technology Co Ltd
Original Assignee
Beijing Causality Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Causality Network Technology Co Ltd filed Critical Beijing Causality Network Technology Co Ltd
Priority to CN201710108744.9A priority Critical patent/CN106940836A/en
Publication of CN106940836A publication Critical patent/CN106940836A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Mining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The present invention relates to data science field, more particularly to a kind of data analysing method and device, this method is, for the data of each hatching mechanism of acquisition, based on the first default correlation analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged respectively, each polymerizing factor after being merged;Each described polymerizing factor is referred in each analysis dimension of determination respectively;The polymerizing factor being referred in each analysis dimension is normalized respectively, and according to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively, in such manner, it is possible to make full use of all data, it is considered to the data of each side, design competition system, final fusion is referred under several analysis dimensions of determination, and hatching mechanism is chosen with objective numeral rather than subjectivity, is improved competition efficiency, is ensured objective and fair.

Description

A kind of data analysing method and device
Technical field
The present invention relates to data science field, more particularly to a kind of data analysing method and device.
Background technology
At present, foundation incubation trend more and more higher technicalization, for hatching mechanism, it is necessary to comment each hatching mechanism Choosing, understands its basal conditions, so as to which each hatching mechanism is managed and instructed.
In the prior art, for hatch mechanism competition system also without complete establishment, only rely upon some simple Index, without effective multi dimensional analysis method, it is impossible to fully reflect the traffic-operating period of each hatching mechanism, investment institution chooses Project is also biased into choosing people, during be difficult to avoid subjective one-sided, efficiency is also than relatively low, while hatching mechanism is used as subject of operation There is not multi dimensional analysis framework yet, government department is also difficult to the situation and difficulty for understanding enterprise, imposes auxiliary.
The content of the invention
The embodiment of the present invention provides a kind of data analysing method and device, to solve that number can not be made full use of in the prior art According to hatching, mechanism carries out the problem of effectively analysis is chosen.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
A kind of data analysing method, including:
It is default based on the first default correlation analysis method and first for the data of each hatching mechanism of acquisition Feature Engineering method, respectively by similarity be more than predetermined threshold value data target merged, after being merged each polymerize because Son;
Each described polymerizing factor is referred in each analysis dimension of determination respectively;
The polymerizing factor being referred in each analysis dimension is normalized respectively, and according to each default point The weighted value of dimension is analysed, the ranking grade of each hatching mechanism is calculated respectively.
In the embodiment of the present invention, for the data of each hatching mechanism of acquisition, based on the first default correlation point Analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, melted respectively Each polymerizing factor after conjunction;Each described polymerizing factor is referred in each analysis dimension of determination respectively;It will return respectively Class is normalized to the polymerizing factor in each analysis dimension, and according to the weighted value of each default analysis dimension, The ranking grade of each hatching mechanism is calculated respectively, in such manner, it is possible to make full use of all data, it is considered to the data of each side, Competition system is designed, final fusion is referred under several analysis dimensions of determination, with objective numeral rather than subjectivity to hatching mechanism Chosen, improve competition efficiency, ensure objective and fair.
Preferably, before similarity is merged more than the data target of predetermined threshold value respectively, further comprising:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data Index is quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default ratio Observation method similar with analysis to method, supplements the value of the data target of missing.
Preferably, the first default correlation analysis method, is Pearson correlation analysis.
Preferably, respectively being merged the data target that similarity is more than predetermined threshold value, each after being merged gathers The factor is closed, is specifically included:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, respectively Calculate the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged.
Preferably, by each described polymerizing factor, being referred in each analysis dimension of determination, specifically including respectively:
, will described each polymerizing factor point using the second default correlation analysis method and the second default Feature Engineering method It is not referred in each analysis dimension of determination.
A kind of data analysis set-up, including:
Integrated unit, for the data of each hatching mechanism for acquisition, based on the first default correlation analysis Method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, merged respectively Each polymerizing factor afterwards;
Sort out unit, for being referred to each described polymerizing factor in each analysis dimension of determination respectively;
Computing unit, for the polymerizing factor being referred in each analysis dimension to be normalized respectively, and root According to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
In the embodiment of the present invention, for the data of each hatching mechanism of acquisition, based on the first default correlation point Analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, melted respectively Each polymerizing factor after conjunction;Each described polymerizing factor is referred in each analysis dimension of determination respectively;It will return respectively Class is normalized to the polymerizing factor in each analysis dimension, and according to the weighted value of each default analysis dimension, The ranking grade of each hatching mechanism is calculated respectively, in such manner, it is possible to make full use of all data, it is considered to the data of each side, Competition system is designed, final fusion is referred under several analysis dimensions of determination, with objective numeral rather than subjectivity to hatching mechanism Chosen, improve competition efficiency, ensure objective and fair.
Preferably, before similarity is merged more than the data target of predetermined threshold value respectively, further comprising, pre- place Unit is managed, is used for:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data Index is quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default ratio Observation method similar with analysis to method, supplements the value of the data target of missing.
Preferably, the first default correlation analysis method, is Pearson correlation analysis.
Preferably, respectively being merged the data target that similarity is more than predetermined threshold value, each after being merged gathers Close the factor when, integrated unit specifically for:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, respectively Calculate the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged.
Preferably, by each described polymerizing factor, when being referred to respectively in each analysis dimension of determination, sorting out unit tool Body is used for:
, will described each polymerizing factor point using the second default correlation analysis method and the second default Feature Engineering method It is not referred in each analysis dimension of determination.
Brief description of the drawings
Fig. 1 be the embodiment of the present invention in, data analysing method flow chart;
Fig. 2 is in the embodiment of the present invention, incubator competition dimension weight constitutes schematic diagram;
Fig. 3 be the embodiment of the present invention in, incubator competition analysis dimension composition schematic diagram;
Fig. 4 is in the embodiment of the present invention, crowd's wound spaces competition dimension weights constitutes schematic diagrames;
Fig. 5 is in the embodiment of the present invention, crowd creates the composition schematic diagrames of the analysis dimensions of spaces competition;
Fig. 6 be the embodiment of the present invention in, data analysis set-up structural representation.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, is not whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
In order to solve that data can not be made full use of in the prior art, to hatching, mechanism carries out the problem of effectively analysis is chosen, In the embodiment of the present invention, for the data of each hatching mechanism of acquisition, the high data of similarity are merged respectively, obtained Polymerizing factor after must merging, and further merge, polymerizing factor is referred in each analysis dimension of determination, and then is carried out After normalized, according to the weighted value of each analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
The present invention program is described in detail below by specific embodiment, certainly, the present invention is not limited to following reality Apply example.
Refering to shown in Fig. 1, in the embodiment of the present invention, the idiographic flow of data analysing method is as follows:
Step 100:For the data of each hatching mechanism of acquisition, based on the first default correlation analysis method and the One default Feature Engineering method, is respectively merged the data target that similarity is more than predetermined threshold value, each after being merged Individual polymerizing factor.
In practice, for example, multiple hatching mechanisms can be included in incubator, many wound spaces, it is to be understood that each hatch machine Structure and incubator, the traffic-operating period in many wound spaces, still, each hatching mechanism may have the data of substantial amounts of different indexs, Including quantitative data target and qualitatively data target, in the prior art, the data volume of each hatching mechanism than larger, and This all data target is not effectively utilized, is only compared from single angle, in particular for qualitatively data Index, without analysis and utilization, without complete competition system, it is impossible to fully reflect the traffic-operating period of each hatching mechanism.This hair In bright embodiment, it data target will qualitatively quantify, design competition system, determine multiple competition analysis dimensions, comprehensive all numbers According to index, hatching mechanism is chosen from multiple angles.
When performing step 100, specifically include:
First, for acquisition each hatching mechanism data, using the first default correlation analysis method and first Default Feature Engineering method, obtains the data target that similarity is more than predetermined threshold value.
Wherein, in the first default correlation analysis method, for example, Pearson correlation analysis, the embodiment of the present invention simultaneously Without limiting, in order to which the high data target of some similarities is merged, the dimension for being eventually used for competition is reduced, It is easy to carry out com-parison and analysis to each hatching mechanism.
Then, the data target that similarity is more than predetermined threshold value is merged respectively, each polymerization after being merged The factor.
Specially:The data target for being more than preset value for the order of magnitude carries out derivation respectively, and uses default computing side Method, calculates the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged respectively.
Wherein, default operation method, for example, adds up, is divided by or normalization process etc., can enter according to the actual requirements It is not defined, and then can calculates and polymerize according to default operation method in row selection and setting, present example The value of the factor.
Wherein, polymerizing factor can be designed according to Feature Engineering method or user be defined in advance, It can also be that both are comprehensive, preferably, the number of polymerizing factor can be tens, so, substantial amounts of data target is melted It is combined into a small amount of polymerizing factor of determination, it is user-friendly and compare.
So, in fusion process, the value that the order of magnitude is more than to the data target of preset value carries out derivation, it is therefore an objective to be Avoid extremum and influence pockety, then the data target after derivation is normalized, so can be in fusion In do not influenceed by the former index order of magnitude, create comparativity.
Further, perform before step 100, also include:
First, the feature that can be used in comparing is gone out from the extracting data of each hatching mechanism respectively, and for qualitative Data target quantified.
For example, obtaining the data of nearly 200 hatchings mechanism in many wound spaces, the data of each hatching mechanism might have Up to a hundred data targets, for the data comprising character description information or only data of word description, for example, hatching mechanism Industry qualification, policy bonus data etc. is obtained, therefrom extract useful numeral or extract available feature, i.e., by original number According to splitting structured.
Wherein, to that when qualitatively data target quantifies, can be formulated according to importance, priority of data target etc. Corresponding quantizing rule, is retouched for example, hatching for each in industry qualification of mechanism, the initial data of acquisition to industry qualification It is only national to be stated, city-level, area's level Three Estate, and so, in the embodiment of the present invention, such qualitative data is quantified, National level can be quantified as to 4, city-level is quantified as 3, and area's level is quantified as 2, and not national, city, what area's rank was approved is quantified as 1, the data target of quantization is so converted into, is easy to subsequently be calculated, used and compared.
Also, when the data to each hatching mechanism are handled, it is ensured that the unit of data will be unified, and split number According to the comparativity that ensure during index each other.
Then, using default screening strategy, the data for meeting the default screening strategy are filtered out, and using default Comparison Method observation method similar with analysis, supplement missing data target value.
In the embodiment of the present invention, different screening strategies can be formulated, so as to arrange according to the type and value of data target Except some extremums and exceptional value, for example, the floor space of correspondence hatching mechanism, according to the understanding to floor space, formulates sieve Choosing strategy, exclusion value is negative, too small or excessive value, it is to avoid its interference calculated to after, point of particularly overall score Cloth.
Also, for the data target of some hatching mechanisms, the value of the data target may not be recorded, at this moment, In order to carry out correctly analyzing and choosing to all hatching structures, it therefore, it can mend the data target without value Its value is filled, in the data of supplement missing, critical value is considered, using default Comparison Method and the similar observation of analysis, and It is not simply to be supplemented using median or average value, but considers overall numeric distribution, so as to further improves supplement The accuracy of the data of missing.
That is, acquiring after the data that each hatches mechanism, first these data are pre-processed, being processed as can The data target of comparativity is calculated, can be used, having, other analyses are carried out after being easy to and are calculated.
Step 110:Each described polymerizing factor is referred in each analysis dimension of determination respectively.
Wherein, each above-mentioned analysis dimension can be user-defined or be found according to Feature Engineering method Feature or both combination, preferably, the number of analysis dimension can be 9, in the embodiment of the present invention and without Limit, in order to from limited analysis dimension, come each hatching mechanism of Studies on Index Selections.
When performing step 110, specifically include:Using the second default correlation analysis method and the second default Feature Engineering Method, each polymerizing factor is referred in each analysis dimension of determination respectively.
What deserves to be explained is, the process for performing step 110 is actually also the process of a data fusion, and second is default Correlation analysis method and the first default correlation analysis method can be identicals, the second default Feature Engineering method and first pre- If Feature Engineering method can also be in identical, the embodiment of the present invention and be not limited, it is therefore an objective to by each polymerizing factor The analysis dimension of lesser number is fused to, as appraisal framework, more meets and is easy to user to carry out analysis competition.
Step 120:The polymerizing factor being referred in each analysis dimension is normalized respectively, and according to default Each analysis dimension weighted value, calculate respectively each hatching mechanism ranking grade.
When performing step 120, specifically include:
First, the polymerizing factor being referred in each analysis dimension is normalized respectively.
Preferably, the value of each polymerizing factor after normalized is exponential form.
So, after being normalized so that the data under each analysis dimension have comparativity, i.e., do not consider further that number According to the unit of representative, but the power under a certain analysis dimension is directly represented with numerical values recited.
Then, according to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
Wherein, the weighted value of each analysis dimension, user can be configured according to the actual requirements.
So, the value and weighted value of the polymerizing factor in each analysis dimension, calculate total score, according to total score Value, can calculate the ranking grade for obtaining each hatching mechanism, it is also possible to by point under different analysis dimensions Value, to filter out the hatching mechanism for having outstanding performance or needing under a certain analysis dimension and strengthen, thus, it is possible to make full use of all Data, are chosen with objective numeral rather than subjective sensation to hatching mechanism, improved competition efficiency, ensured objective and fair.
Further description is made to above-described embodiment using several specific application scenarios below.
First application scenarios:For incubator competition.
Referring particularly in shown in Fig. 2, being the embodiment of the present invention, incubator competition dimension weight constitutes schematic diagram.
For example, refering in shown in Fig. 3, being the embodiment of the present invention, the composition schematic diagram of the analysis dimension of incubator competition.Pin To some incubator, its 8 analysis dimensions are determined, are respectively:It is overall to run index, drive employment index, enter competition among enterprises Power index, institution regional expansion exponent, media concern index, company operation index, hatching high-quality project index, chased after by investment circle Hold in both hands index.
First, after the data for getting the hatching of each in incubator mechanism, respectively to the data of each hatching mechanism Pre-processed, including:Splitting structured, quantization, screening and supplement.
Then, data target is fused to polymerizing factor, and be referred in above-mentioned 8 analyses dimension.
For example, in Fig. 2, incubator income, incubator profit, to enter enterprise, incubator number of employees, incubated enterprise high Technology Enterprises number, expansion exponent etc., also, this multiple data target listed in Fig. 2, are only a citing signals, certainly According to actual conditions, the data target of acquisition is also different, also, the corresponding percentage of different pieces of information index is power in Fig. 2 Weight values, user can also be configured according to the actual requirements.
As can be known from Fig. 2, the initial data of acquisition, data volume also has many data targets than larger, and for certain A little data cannot be used directly for follow-up calculating, therefore, in the embodiment of the present invention, after initial data is handled, be melted Close and sort out, and then obtain it is determined that 8 analysis dimensions under value, can not only divide theme, moreover it is possible to concentrated expression hatch machine The traffic-operating period of structure.
Finally, according to the weighted value of each analysis dimension, calculated for rank grade.
Second application scenarios:For the space competition of crowd's wound.
Referring particularly in shown in Fig. 4, being the embodiment of the present invention, crowd's wound spaces competition dimension weights constitute schematic diagrames.
It is that crowd creates the composition schematic diagrames of the analysis dimensions of spaces competition in the embodiment of the present invention for example, referring to shown in Fig. 5. For some crowd's wound space, its 9 analysis dimensions are determined, are respectively:It is overall to run index, drive employment index, enter enterprise Competitiveness index, institution regional expansion exponent, media concern index, joint work's index, training are taught index, chased after by investment circle Hold in both hands index, popular industry cover index.
Also, also example goes out a variety of different data targets in Fig. 4, such as many wound spaces are taken in, enter enterprise, enter wound Industry team, receive social employment number etc., naturally it is also possible to according to actual conditions, obtain for many wound spaces it is different or more Data target, the corresponding percentage of different pieces of information index, user can also be configured, in the embodiment of the present invention and without limit System.
So, the hatching mechanism in incubator or many wound spaces can be chosen, from the angle of quantization, objective multidimensional The traffic-operating period of hatching mechanism each side is weighed in degree analysis, it is to avoid subjective fuzzy evaluation, improves competition efficiency and objectivity, and It is quick to filter out the hatching mechanism that totality or certain aspect are had outstanding performance, while the not enough developing direction of hatching mechanism can be understood, And then Improving Measurements are targetedly imposed, by strengthening cooperation, promote the overall development of hatching mechanism, the pipe to hatching mechanism Reason and development all have very big directive significance.
Based on above-described embodiment, refering to shown in Fig. 6, in the embodiment of the present invention, data analysis set-up is specifically included:
Integrated unit 60, for the data of each hatching mechanism for acquisition, based on the first default correlation point Analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, melted respectively Each polymerizing factor after conjunction;
Sort out unit 61, for being referred to each described polymerizing factor in each analysis dimension of determination respectively;
Computing unit 62, for the polymerizing factor being referred in each analysis dimension to be normalized respectively, and According to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
Preferably, before similarity is merged more than the data target of predetermined threshold value respectively, further comprising, pre- place Unit 63 is managed, is used for:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data Index is quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default ratio Observation method similar with analysis to method, supplements the value of the data target of missing.
Preferably, the first default correlation analysis method, is Pearson correlation analysis.
Preferably, respectively being merged the data target that similarity is more than predetermined threshold value, each after being merged gathers Close the factor when, integrated unit 60 specifically for:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, respectively Calculate the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged.
Preferably, by each described polymerizing factor, when being referred to respectively in each analysis dimension of determination, sorting out unit 61 Specifically for:
, will described each polymerizing factor point using the second default correlation analysis method and the second default Feature Engineering method It is not referred in each analysis dimension of determination.
In summary, it is default based on first for the data of each hatching mechanism of acquisition in the embodiment of the present invention Correlation analysis method and the first default Feature Engineering method, are respectively melted the data target that similarity is more than predetermined threshold value Close, each polymerizing factor after being merged;By each described polymerizing factor, each analysis dimension of determination is referred to respectively In;The polymerizing factor being referred in each analysis dimension is normalized respectively, and tieed up according to each default analysis The weighted value of degree, calculates the ranking grade of each hatching mechanism, in such manner, it is possible to make full use of all data, it is considered to each respectively The data of aspect, design competition system, and final fusion is referred under several analysis dimensions of determination, with objective numeral rather than subjectivity To hatching, mechanism chooses, and improves competition efficiency, ensures objective and fair.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that every one stream in flow chart and/or block diagram can be realized by computer program instructions Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without departing from this hair to the embodiment of the present invention The spirit and scope of bright embodiment.So, if these modifications and modification of the embodiment of the present invention belong to the claims in the present invention And its within the scope of equivalent technologies, then the present invention is also intended to comprising including these changes and modification.

Claims (10)

1. a kind of data analysing method, it is characterised in that including:
For the data of each hatching mechanism of acquisition, based on the first default correlation analysis method and the first default feature Engineering method, is respectively merged the data target that similarity is more than predetermined threshold value, each polymerizing factor after being merged;
Each described polymerizing factor is referred in each analysis dimension of determination respectively;
The polymerizing factor being referred in each analysis dimension is normalized respectively, and tieed up according to each default analysis The weighted value of degree, calculates the ranking grade of each hatching mechanism respectively.
2. the method as described in claim 1, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value Before fusion, further comprise:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data target Quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default Comparison Method Observation method similar with analysis, supplements the value of the data target of missing.
3. method as claimed in claim 2, it is characterised in that the first default correlation analysis method, is Pearson phases The analysis of closing property.
4. method as claimed in claim 2, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value Fusion, each polymerizing factor after being merged is specifically included:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, calculates respectively Similarity is more than the operation result of the data target of preset value, each polymerizing factor after being merged.
5. the method as described in claim any one of 1-4, it is characterised in that by each described polymerizing factor, be referred to respectively In each analysis dimension determined, specifically include:
Using the second default correlation analysis method and the second default Feature Engineering method, each described polymerizing factor is returned respectively Class is into each analysis dimension of determination.
6. a kind of data analysis set-up, it is characterised in that including:
Integrated unit, for the data of each hatching mechanism for acquisition, based on the first default correlation analysis method and First default Feature Engineering method, is respectively merged the data target that similarity is more than predetermined threshold value, after being merged Each polymerizing factor;
Sort out unit, for being referred to each described polymerizing factor in each analysis dimension of determination respectively;
Computing unit, for the polymerizing factor being referred in each analysis dimension to be normalized respectively, and according to pre- If each analysis dimension weighted value, calculate respectively each hatching mechanism ranking grade.
7. device as claimed in claim 6, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value Before fusion, further comprise, pretreatment unit is used for:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data target Quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default Comparison Method Observation method similar with analysis, supplements the value of the data target of missing.
8. device as claimed in claim 7, it is characterised in that the first default correlation analysis method, is Pearson phases The analysis of closing property.
9. device as claimed in claim 7, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value When fusion, each polymerizing factor after being merged, integrated unit specifically for:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, calculates respectively Similarity is more than the operation result of the data target of preset value, each polymerizing factor after being merged.
10. the device as described in claim any one of 6-9, it is characterised in that by each described polymerizing factor, be referred to respectively Determine each analysis dimension in when, sort out unit specifically for:
Using the second default correlation analysis method and the second default Feature Engineering method, each described polymerizing factor is returned respectively Class is into each analysis dimension of determination.
CN201710108744.9A 2017-02-27 2017-02-27 A kind of data analysing method and device Pending CN106940836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710108744.9A CN106940836A (en) 2017-02-27 2017-02-27 A kind of data analysing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710108744.9A CN106940836A (en) 2017-02-27 2017-02-27 A kind of data analysing method and device

Publications (1)

Publication Number Publication Date
CN106940836A true CN106940836A (en) 2017-07-11

Family

ID=59468959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710108744.9A Pending CN106940836A (en) 2017-02-27 2017-02-27 A kind of data analysing method and device

Country Status (1)

Country Link
CN (1) CN106940836A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019134579A1 (en) * 2018-01-04 2019-07-11 深圳壹账通智能科技有限公司 Method, electronic device, and computer readable storage medium for selecting investment target
CN110458447A (en) * 2019-08-07 2019-11-15 软通动力信息技术有限公司 Innovative space evaluation method, device, computer equipment and storage medium
CN111209997A (en) * 2018-11-22 2020-05-29 北京国双科技有限公司 Data analysis method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019134579A1 (en) * 2018-01-04 2019-07-11 深圳壹账通智能科技有限公司 Method, electronic device, and computer readable storage medium for selecting investment target
CN111209997A (en) * 2018-11-22 2020-05-29 北京国双科技有限公司 Data analysis method and device
CN111209997B (en) * 2018-11-22 2023-04-07 北京国双科技有限公司 Data analysis method and device
CN110458447A (en) * 2019-08-07 2019-11-15 软通动力信息技术有限公司 Innovative space evaluation method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Fu et al. Unbalanced double hierarchy linguistic term set: The TOPSIS method for multi-expert qualitative decision making involving green mine selection
CN107633265A (en) For optimizing the data processing method and device of credit evaluation model
US10083263B2 (en) Automatic modeling farmer
CN112541532B (en) Target detection method based on dense connection structure
CN109657721A (en) A kind of multi-class decision-making technique of combination fuzzy set and random forest tree
CN106940836A (en) A kind of data analysing method and device
CN107908536A (en) To the performance estimating method and system of GPU applications in CPU GPU isomerous environments
CN111199469A (en) User payment model generation method and device and electronic equipment
CN110781174A (en) Feature engineering modeling method and system using pca and feature intersection
CN103207804B (en) Based on the MapReduce load simulation method of group operation daily record
CN110363662A (en) A kind of personal credit points-scoring system
CN114385465A (en) Fault prediction method, equipment and storage medium
Hoque et al. Efficiency measurement on banking sector in Bangladesh
CN110866694A (en) Power grid construction project financial evaluation system and method
CN108805152A (en) A kind of scene classification method and device
CN104102716A (en) Imbalance data predicting method based on cluster stratified sampling compensation logic regression
CN1936887A (en) Automatic text classification method based on classification concept space
CN113159419A (en) Group feature portrait analysis method, device and equipment and readable storage medium
CN111930815A (en) Method and system for constructing enterprise portrait based on industry attribute and business attribute
Shi A Machine Learning Study on the Model Performance of Human Resources Predictive Algorithms
CN111325431A (en) Method for evaluating satellite system integration maturity
CN114510518B (en) Self-adaptive aggregation method and system for massive structured data and electronic equipment
Nabahat Two-stage DEA with fuzzy data
Yangailo et al. The Impact of Industrialisation on Zambia’s Economic Growth
Mohamadi Zanjirani et al. Strategies for developing native digital games: integrating theme analysis and mathematical programming approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170711

RJ01 Rejection of invention patent application after publication