CN106940836A - A kind of data analysing method and device - Google Patents
A kind of data analysing method and device Download PDFInfo
- Publication number
- CN106940836A CN106940836A CN201710108744.9A CN201710108744A CN106940836A CN 106940836 A CN106940836 A CN 106940836A CN 201710108744 A CN201710108744 A CN 201710108744A CN 106940836 A CN106940836 A CN 106940836A
- Authority
- CN
- China
- Prior art keywords
- data
- analysis
- default
- polymerizing factor
- data target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Mining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Abstract
The present invention relates to data science field, more particularly to a kind of data analysing method and device, this method is, for the data of each hatching mechanism of acquisition, based on the first default correlation analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged respectively, each polymerizing factor after being merged;Each described polymerizing factor is referred in each analysis dimension of determination respectively;The polymerizing factor being referred in each analysis dimension is normalized respectively, and according to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively, in such manner, it is possible to make full use of all data, it is considered to the data of each side, design competition system, final fusion is referred under several analysis dimensions of determination, and hatching mechanism is chosen with objective numeral rather than subjectivity, is improved competition efficiency, is ensured objective and fair.
Description
Technical field
The present invention relates to data science field, more particularly to a kind of data analysing method and device.
Background technology
At present, foundation incubation trend more and more higher technicalization, for hatching mechanism, it is necessary to comment each hatching mechanism
Choosing, understands its basal conditions, so as to which each hatching mechanism is managed and instructed.
In the prior art, for hatch mechanism competition system also without complete establishment, only rely upon some simple
Index, without effective multi dimensional analysis method, it is impossible to fully reflect the traffic-operating period of each hatching mechanism, investment institution chooses
Project is also biased into choosing people, during be difficult to avoid subjective one-sided, efficiency is also than relatively low, while hatching mechanism is used as subject of operation
There is not multi dimensional analysis framework yet, government department is also difficult to the situation and difficulty for understanding enterprise, imposes auxiliary.
The content of the invention
The embodiment of the present invention provides a kind of data analysing method and device, to solve that number can not be made full use of in the prior art
According to hatching, mechanism carries out the problem of effectively analysis is chosen.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
A kind of data analysing method, including:
It is default based on the first default correlation analysis method and first for the data of each hatching mechanism of acquisition
Feature Engineering method, respectively by similarity be more than predetermined threshold value data target merged, after being merged each polymerize because
Son;
Each described polymerizing factor is referred in each analysis dimension of determination respectively;
The polymerizing factor being referred in each analysis dimension is normalized respectively, and according to each default point
The weighted value of dimension is analysed, the ranking grade of each hatching mechanism is calculated respectively.
In the embodiment of the present invention, for the data of each hatching mechanism of acquisition, based on the first default correlation point
Analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, melted respectively
Each polymerizing factor after conjunction;Each described polymerizing factor is referred in each analysis dimension of determination respectively;It will return respectively
Class is normalized to the polymerizing factor in each analysis dimension, and according to the weighted value of each default analysis dimension,
The ranking grade of each hatching mechanism is calculated respectively, in such manner, it is possible to make full use of all data, it is considered to the data of each side,
Competition system is designed, final fusion is referred under several analysis dimensions of determination, with objective numeral rather than subjectivity to hatching mechanism
Chosen, improve competition efficiency, ensure objective and fair.
Preferably, before similarity is merged more than the data target of predetermined threshold value respectively, further comprising:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data
Index is quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default ratio
Observation method similar with analysis to method, supplements the value of the data target of missing.
Preferably, the first default correlation analysis method, is Pearson correlation analysis.
Preferably, respectively being merged the data target that similarity is more than predetermined threshold value, each after being merged gathers
The factor is closed, is specifically included:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, respectively
Calculate the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged.
Preferably, by each described polymerizing factor, being referred in each analysis dimension of determination, specifically including respectively:
, will described each polymerizing factor point using the second default correlation analysis method and the second default Feature Engineering method
It is not referred in each analysis dimension of determination.
A kind of data analysis set-up, including:
Integrated unit, for the data of each hatching mechanism for acquisition, based on the first default correlation analysis
Method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, merged respectively
Each polymerizing factor afterwards;
Sort out unit, for being referred to each described polymerizing factor in each analysis dimension of determination respectively;
Computing unit, for the polymerizing factor being referred in each analysis dimension to be normalized respectively, and root
According to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
In the embodiment of the present invention, for the data of each hatching mechanism of acquisition, based on the first default correlation point
Analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, melted respectively
Each polymerizing factor after conjunction;Each described polymerizing factor is referred in each analysis dimension of determination respectively;It will return respectively
Class is normalized to the polymerizing factor in each analysis dimension, and according to the weighted value of each default analysis dimension,
The ranking grade of each hatching mechanism is calculated respectively, in such manner, it is possible to make full use of all data, it is considered to the data of each side,
Competition system is designed, final fusion is referred under several analysis dimensions of determination, with objective numeral rather than subjectivity to hatching mechanism
Chosen, improve competition efficiency, ensure objective and fair.
Preferably, before similarity is merged more than the data target of predetermined threshold value respectively, further comprising, pre- place
Unit is managed, is used for:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data
Index is quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default ratio
Observation method similar with analysis to method, supplements the value of the data target of missing.
Preferably, the first default correlation analysis method, is Pearson correlation analysis.
Preferably, respectively being merged the data target that similarity is more than predetermined threshold value, each after being merged gathers
Close the factor when, integrated unit specifically for:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, respectively
Calculate the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged.
Preferably, by each described polymerizing factor, when being referred to respectively in each analysis dimension of determination, sorting out unit tool
Body is used for:
, will described each polymerizing factor point using the second default correlation analysis method and the second default Feature Engineering method
It is not referred in each analysis dimension of determination.
Brief description of the drawings
Fig. 1 be the embodiment of the present invention in, data analysing method flow chart;
Fig. 2 is in the embodiment of the present invention, incubator competition dimension weight constitutes schematic diagram;
Fig. 3 be the embodiment of the present invention in, incubator competition analysis dimension composition schematic diagram;
Fig. 4 is in the embodiment of the present invention, crowd's wound spaces competition dimension weights constitutes schematic diagrames;
Fig. 5 is in the embodiment of the present invention, crowd creates the composition schematic diagrames of the analysis dimensions of spaces competition;
Fig. 6 be the embodiment of the present invention in, data analysis set-up structural representation.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, is not whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
In order to solve that data can not be made full use of in the prior art, to hatching, mechanism carries out the problem of effectively analysis is chosen,
In the embodiment of the present invention, for the data of each hatching mechanism of acquisition, the high data of similarity are merged respectively, obtained
Polymerizing factor after must merging, and further merge, polymerizing factor is referred in each analysis dimension of determination, and then is carried out
After normalized, according to the weighted value of each analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
The present invention program is described in detail below by specific embodiment, certainly, the present invention is not limited to following reality
Apply example.
Refering to shown in Fig. 1, in the embodiment of the present invention, the idiographic flow of data analysing method is as follows:
Step 100:For the data of each hatching mechanism of acquisition, based on the first default correlation analysis method and the
One default Feature Engineering method, is respectively merged the data target that similarity is more than predetermined threshold value, each after being merged
Individual polymerizing factor.
In practice, for example, multiple hatching mechanisms can be included in incubator, many wound spaces, it is to be understood that each hatch machine
Structure and incubator, the traffic-operating period in many wound spaces, still, each hatching mechanism may have the data of substantial amounts of different indexs,
Including quantitative data target and qualitatively data target, in the prior art, the data volume of each hatching mechanism than larger, and
This all data target is not effectively utilized, is only compared from single angle, in particular for qualitatively data
Index, without analysis and utilization, without complete competition system, it is impossible to fully reflect the traffic-operating period of each hatching mechanism.This hair
In bright embodiment, it data target will qualitatively quantify, design competition system, determine multiple competition analysis dimensions, comprehensive all numbers
According to index, hatching mechanism is chosen from multiple angles.
When performing step 100, specifically include:
First, for acquisition each hatching mechanism data, using the first default correlation analysis method and first
Default Feature Engineering method, obtains the data target that similarity is more than predetermined threshold value.
Wherein, in the first default correlation analysis method, for example, Pearson correlation analysis, the embodiment of the present invention simultaneously
Without limiting, in order to which the high data target of some similarities is merged, the dimension for being eventually used for competition is reduced,
It is easy to carry out com-parison and analysis to each hatching mechanism.
Then, the data target that similarity is more than predetermined threshold value is merged respectively, each polymerization after being merged
The factor.
Specially:The data target for being more than preset value for the order of magnitude carries out derivation respectively, and uses default computing side
Method, calculates the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged respectively.
Wherein, default operation method, for example, adds up, is divided by or normalization process etc., can enter according to the actual requirements
It is not defined, and then can calculates and polymerize according to default operation method in row selection and setting, present example
The value of the factor.
Wherein, polymerizing factor can be designed according to Feature Engineering method or user be defined in advance,
It can also be that both are comprehensive, preferably, the number of polymerizing factor can be tens, so, substantial amounts of data target is melted
It is combined into a small amount of polymerizing factor of determination, it is user-friendly and compare.
So, in fusion process, the value that the order of magnitude is more than to the data target of preset value carries out derivation, it is therefore an objective to be
Avoid extremum and influence pockety, then the data target after derivation is normalized, so can be in fusion
In do not influenceed by the former index order of magnitude, create comparativity.
Further, perform before step 100, also include:
First, the feature that can be used in comparing is gone out from the extracting data of each hatching mechanism respectively, and for qualitative
Data target quantified.
For example, obtaining the data of nearly 200 hatchings mechanism in many wound spaces, the data of each hatching mechanism might have
Up to a hundred data targets, for the data comprising character description information or only data of word description, for example, hatching mechanism
Industry qualification, policy bonus data etc. is obtained, therefrom extract useful numeral or extract available feature, i.e., by original number
According to splitting structured.
Wherein, to that when qualitatively data target quantifies, can be formulated according to importance, priority of data target etc.
Corresponding quantizing rule, is retouched for example, hatching for each in industry qualification of mechanism, the initial data of acquisition to industry qualification
It is only national to be stated, city-level, area's level Three Estate, and so, in the embodiment of the present invention, such qualitative data is quantified,
National level can be quantified as to 4, city-level is quantified as 3, and area's level is quantified as 2, and not national, city, what area's rank was approved is quantified as
1, the data target of quantization is so converted into, is easy to subsequently be calculated, used and compared.
Also, when the data to each hatching mechanism are handled, it is ensured that the unit of data will be unified, and split number
According to the comparativity that ensure during index each other.
Then, using default screening strategy, the data for meeting the default screening strategy are filtered out, and using default
Comparison Method observation method similar with analysis, supplement missing data target value.
In the embodiment of the present invention, different screening strategies can be formulated, so as to arrange according to the type and value of data target
Except some extremums and exceptional value, for example, the floor space of correspondence hatching mechanism, according to the understanding to floor space, formulates sieve
Choosing strategy, exclusion value is negative, too small or excessive value, it is to avoid its interference calculated to after, point of particularly overall score
Cloth.
Also, for the data target of some hatching mechanisms, the value of the data target may not be recorded, at this moment,
In order to carry out correctly analyzing and choosing to all hatching structures, it therefore, it can mend the data target without value
Its value is filled, in the data of supplement missing, critical value is considered, using default Comparison Method and the similar observation of analysis, and
It is not simply to be supplemented using median or average value, but considers overall numeric distribution, so as to further improves supplement
The accuracy of the data of missing.
That is, acquiring after the data that each hatches mechanism, first these data are pre-processed, being processed as can
The data target of comparativity is calculated, can be used, having, other analyses are carried out after being easy to and are calculated.
Step 110:Each described polymerizing factor is referred in each analysis dimension of determination respectively.
Wherein, each above-mentioned analysis dimension can be user-defined or be found according to Feature Engineering method
Feature or both combination, preferably, the number of analysis dimension can be 9, in the embodiment of the present invention and without
Limit, in order to from limited analysis dimension, come each hatching mechanism of Studies on Index Selections.
When performing step 110, specifically include:Using the second default correlation analysis method and the second default Feature Engineering
Method, each polymerizing factor is referred in each analysis dimension of determination respectively.
What deserves to be explained is, the process for performing step 110 is actually also the process of a data fusion, and second is default
Correlation analysis method and the first default correlation analysis method can be identicals, the second default Feature Engineering method and first pre-
If Feature Engineering method can also be in identical, the embodiment of the present invention and be not limited, it is therefore an objective to by each polymerizing factor
The analysis dimension of lesser number is fused to, as appraisal framework, more meets and is easy to user to carry out analysis competition.
Step 120:The polymerizing factor being referred in each analysis dimension is normalized respectively, and according to default
Each analysis dimension weighted value, calculate respectively each hatching mechanism ranking grade.
When performing step 120, specifically include:
First, the polymerizing factor being referred in each analysis dimension is normalized respectively.
Preferably, the value of each polymerizing factor after normalized is exponential form.
So, after being normalized so that the data under each analysis dimension have comparativity, i.e., do not consider further that number
According to the unit of representative, but the power under a certain analysis dimension is directly represented with numerical values recited.
Then, according to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
Wherein, the weighted value of each analysis dimension, user can be configured according to the actual requirements.
So, the value and weighted value of the polymerizing factor in each analysis dimension, calculate total score, according to total score
Value, can calculate the ranking grade for obtaining each hatching mechanism, it is also possible to by point under different analysis dimensions
Value, to filter out the hatching mechanism for having outstanding performance or needing under a certain analysis dimension and strengthen, thus, it is possible to make full use of all
Data, are chosen with objective numeral rather than subjective sensation to hatching mechanism, improved competition efficiency, ensured objective and fair.
Further description is made to above-described embodiment using several specific application scenarios below.
First application scenarios:For incubator competition.
Referring particularly in shown in Fig. 2, being the embodiment of the present invention, incubator competition dimension weight constitutes schematic diagram.
For example, refering in shown in Fig. 3, being the embodiment of the present invention, the composition schematic diagram of the analysis dimension of incubator competition.Pin
To some incubator, its 8 analysis dimensions are determined, are respectively:It is overall to run index, drive employment index, enter competition among enterprises
Power index, institution regional expansion exponent, media concern index, company operation index, hatching high-quality project index, chased after by investment circle
Hold in both hands index.
First, after the data for getting the hatching of each in incubator mechanism, respectively to the data of each hatching mechanism
Pre-processed, including:Splitting structured, quantization, screening and supplement.
Then, data target is fused to polymerizing factor, and be referred in above-mentioned 8 analyses dimension.
For example, in Fig. 2, incubator income, incubator profit, to enter enterprise, incubator number of employees, incubated enterprise high
Technology Enterprises number, expansion exponent etc., also, this multiple data target listed in Fig. 2, are only a citing signals, certainly
According to actual conditions, the data target of acquisition is also different, also, the corresponding percentage of different pieces of information index is power in Fig. 2
Weight values, user can also be configured according to the actual requirements.
As can be known from Fig. 2, the initial data of acquisition, data volume also has many data targets than larger, and for certain
A little data cannot be used directly for follow-up calculating, therefore, in the embodiment of the present invention, after initial data is handled, be melted
Close and sort out, and then obtain it is determined that 8 analysis dimensions under value, can not only divide theme, moreover it is possible to concentrated expression hatch machine
The traffic-operating period of structure.
Finally, according to the weighted value of each analysis dimension, calculated for rank grade.
Second application scenarios:For the space competition of crowd's wound.
Referring particularly in shown in Fig. 4, being the embodiment of the present invention, crowd's wound spaces competition dimension weights constitute schematic diagrames.
It is that crowd creates the composition schematic diagrames of the analysis dimensions of spaces competition in the embodiment of the present invention for example, referring to shown in Fig. 5.
For some crowd's wound space, its 9 analysis dimensions are determined, are respectively:It is overall to run index, drive employment index, enter enterprise
Competitiveness index, institution regional expansion exponent, media concern index, joint work's index, training are taught index, chased after by investment circle
Hold in both hands index, popular industry cover index.
Also, also example goes out a variety of different data targets in Fig. 4, such as many wound spaces are taken in, enter enterprise, enter wound
Industry team, receive social employment number etc., naturally it is also possible to according to actual conditions, obtain for many wound spaces it is different or more
Data target, the corresponding percentage of different pieces of information index, user can also be configured, in the embodiment of the present invention and without limit
System.
So, the hatching mechanism in incubator or many wound spaces can be chosen, from the angle of quantization, objective multidimensional
The traffic-operating period of hatching mechanism each side is weighed in degree analysis, it is to avoid subjective fuzzy evaluation, improves competition efficiency and objectivity, and
It is quick to filter out the hatching mechanism that totality or certain aspect are had outstanding performance, while the not enough developing direction of hatching mechanism can be understood,
And then Improving Measurements are targetedly imposed, by strengthening cooperation, promote the overall development of hatching mechanism, the pipe to hatching mechanism
Reason and development all have very big directive significance.
Based on above-described embodiment, refering to shown in Fig. 6, in the embodiment of the present invention, data analysis set-up is specifically included:
Integrated unit 60, for the data of each hatching mechanism for acquisition, based on the first default correlation point
Analysis method and the first default Feature Engineering method, the data target that similarity is more than predetermined threshold value is merged, melted respectively
Each polymerizing factor after conjunction;
Sort out unit 61, for being referred to each described polymerizing factor in each analysis dimension of determination respectively;
Computing unit 62, for the polymerizing factor being referred in each analysis dimension to be normalized respectively, and
According to the weighted value of each default analysis dimension, the ranking grade of each hatching mechanism is calculated respectively.
Preferably, before similarity is merged more than the data target of predetermined threshold value respectively, further comprising, pre- place
Unit 63 is managed, is used for:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data
Index is quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default ratio
Observation method similar with analysis to method, supplements the value of the data target of missing.
Preferably, the first default correlation analysis method, is Pearson correlation analysis.
Preferably, respectively being merged the data target that similarity is more than predetermined threshold value, each after being merged gathers
Close the factor when, integrated unit 60 specifically for:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, respectively
Calculate the operation result that similarity is more than the data target of preset value, each polymerizing factor after being merged.
Preferably, by each described polymerizing factor, when being referred to respectively in each analysis dimension of determination, sorting out unit 61
Specifically for:
, will described each polymerizing factor point using the second default correlation analysis method and the second default Feature Engineering method
It is not referred in each analysis dimension of determination.
In summary, it is default based on first for the data of each hatching mechanism of acquisition in the embodiment of the present invention
Correlation analysis method and the first default Feature Engineering method, are respectively melted the data target that similarity is more than predetermined threshold value
Close, each polymerizing factor after being merged;By each described polymerizing factor, each analysis dimension of determination is referred to respectively
In;The polymerizing factor being referred in each analysis dimension is normalized respectively, and tieed up according to each default analysis
The weighted value of degree, calculates the ranking grade of each hatching mechanism, in such manner, it is possible to make full use of all data, it is considered to each respectively
The data of aspect, design competition system, and final fusion is referred under several analysis dimensions of determination, with objective numeral rather than subjectivity
To hatching, mechanism chooses, and improves competition efficiency, ensures objective and fair.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code
The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram are described.It should be understood that every one stream in flow chart and/or block diagram can be realized by computer program instructions
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
, but those skilled in the art once know basic creation although preferred embodiments of the present invention have been described
Property concept, then can make other change and modification to these embodiments.So, appended claims are intended to be construed to include excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without departing from this hair to the embodiment of the present invention
The spirit and scope of bright embodiment.So, if these modifications and modification of the embodiment of the present invention belong to the claims in the present invention
And its within the scope of equivalent technologies, then the present invention is also intended to comprising including these changes and modification.
Claims (10)
1. a kind of data analysing method, it is characterised in that including:
For the data of each hatching mechanism of acquisition, based on the first default correlation analysis method and the first default feature
Engineering method, is respectively merged the data target that similarity is more than predetermined threshold value, each polymerizing factor after being merged;
Each described polymerizing factor is referred in each analysis dimension of determination respectively;
The polymerizing factor being referred in each analysis dimension is normalized respectively, and tieed up according to each default analysis
The weighted value of degree, calculates the ranking grade of each hatching mechanism respectively.
2. the method as described in claim 1, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value
Before fusion, further comprise:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data target
Quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default Comparison Method
Observation method similar with analysis, supplements the value of the data target of missing.
3. method as claimed in claim 2, it is characterised in that the first default correlation analysis method, is Pearson phases
The analysis of closing property.
4. method as claimed in claim 2, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value
Fusion, each polymerizing factor after being merged is specifically included:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, calculates respectively
Similarity is more than the operation result of the data target of preset value, each polymerizing factor after being merged.
5. the method as described in claim any one of 1-4, it is characterised in that by each described polymerizing factor, be referred to respectively
In each analysis dimension determined, specifically include:
Using the second default correlation analysis method and the second default Feature Engineering method, each described polymerizing factor is returned respectively
Class is into each analysis dimension of determination.
6. a kind of data analysis set-up, it is characterised in that including:
Integrated unit, for the data of each hatching mechanism for acquisition, based on the first default correlation analysis method and
First default Feature Engineering method, is respectively merged the data target that similarity is more than predetermined threshold value, after being merged
Each polymerizing factor;
Sort out unit, for being referred to each described polymerizing factor in each analysis dimension of determination respectively;
Computing unit, for the polymerizing factor being referred in each analysis dimension to be normalized respectively, and according to pre-
If each analysis dimension weighted value, calculate respectively each hatching mechanism ranking grade.
7. device as claimed in claim 6, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value
Before fusion, further comprise, pretreatment unit is used for:
Go out the feature that can be used in comparing from the extracting data of each hatching mechanism respectively, and for qualitatively data target
Quantified;
Using default screening strategy, the data for meeting the default screening strategy are filtered out, and use default Comparison Method
Observation method similar with analysis, supplements the value of the data target of missing.
8. device as claimed in claim 7, it is characterised in that the first default correlation analysis method, is Pearson phases
The analysis of closing property.
9. device as claimed in claim 7, it is characterised in that respectively carry out the data target that similarity is more than predetermined threshold value
When fusion, each polymerizing factor after being merged, integrated unit specifically for:
It is more than the data target of preset value for the order of magnitude, derivation is carried out respectively, and use default operation method, calculates respectively
Similarity is more than the operation result of the data target of preset value, each polymerizing factor after being merged.
10. the device as described in claim any one of 6-9, it is characterised in that by each described polymerizing factor, be referred to respectively
Determine each analysis dimension in when, sort out unit specifically for:
Using the second default correlation analysis method and the second default Feature Engineering method, each described polymerizing factor is returned respectively
Class is into each analysis dimension of determination.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710108744.9A CN106940836A (en) | 2017-02-27 | 2017-02-27 | A kind of data analysing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710108744.9A CN106940836A (en) | 2017-02-27 | 2017-02-27 | A kind of data analysing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106940836A true CN106940836A (en) | 2017-07-11 |
Family
ID=59468959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710108744.9A Pending CN106940836A (en) | 2017-02-27 | 2017-02-27 | A kind of data analysing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106940836A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019134579A1 (en) * | 2018-01-04 | 2019-07-11 | 深圳壹账通智能科技有限公司 | Method, electronic device, and computer readable storage medium for selecting investment target |
CN110458447A (en) * | 2019-08-07 | 2019-11-15 | 软通动力信息技术有限公司 | Innovative space evaluation method, device, computer equipment and storage medium |
CN111209997A (en) * | 2018-11-22 | 2020-05-29 | 北京国双科技有限公司 | Data analysis method and device |
-
2017
- 2017-02-27 CN CN201710108744.9A patent/CN106940836A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019134579A1 (en) * | 2018-01-04 | 2019-07-11 | 深圳壹账通智能科技有限公司 | Method, electronic device, and computer readable storage medium for selecting investment target |
CN111209997A (en) * | 2018-11-22 | 2020-05-29 | 北京国双科技有限公司 | Data analysis method and device |
CN111209997B (en) * | 2018-11-22 | 2023-04-07 | 北京国双科技有限公司 | Data analysis method and device |
CN110458447A (en) * | 2019-08-07 | 2019-11-15 | 软通动力信息技术有限公司 | Innovative space evaluation method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fu et al. | Unbalanced double hierarchy linguistic term set: The TOPSIS method for multi-expert qualitative decision making involving green mine selection | |
CN107633265A (en) | For optimizing the data processing method and device of credit evaluation model | |
US10083263B2 (en) | Automatic modeling farmer | |
CN112541532B (en) | Target detection method based on dense connection structure | |
CN109657721A (en) | A kind of multi-class decision-making technique of combination fuzzy set and random forest tree | |
CN106940836A (en) | A kind of data analysing method and device | |
CN107908536A (en) | To the performance estimating method and system of GPU applications in CPU GPU isomerous environments | |
CN111199469A (en) | User payment model generation method and device and electronic equipment | |
CN110781174A (en) | Feature engineering modeling method and system using pca and feature intersection | |
CN103207804B (en) | Based on the MapReduce load simulation method of group operation daily record | |
CN110363662A (en) | A kind of personal credit points-scoring system | |
CN114385465A (en) | Fault prediction method, equipment and storage medium | |
Hoque et al. | Efficiency measurement on banking sector in Bangladesh | |
CN110866694A (en) | Power grid construction project financial evaluation system and method | |
CN108805152A (en) | A kind of scene classification method and device | |
CN104102716A (en) | Imbalance data predicting method based on cluster stratified sampling compensation logic regression | |
CN1936887A (en) | Automatic text classification method based on classification concept space | |
CN113159419A (en) | Group feature portrait analysis method, device and equipment and readable storage medium | |
CN111930815A (en) | Method and system for constructing enterprise portrait based on industry attribute and business attribute | |
Shi | A Machine Learning Study on the Model Performance of Human Resources Predictive Algorithms | |
CN111325431A (en) | Method for evaluating satellite system integration maturity | |
CN114510518B (en) | Self-adaptive aggregation method and system for massive structured data and electronic equipment | |
Nabahat | Two-stage DEA with fuzzy data | |
Yangailo et al. | The Impact of Industrialisation on Zambia’s Economic Growth | |
Mohamadi Zanjirani et al. | Strategies for developing native digital games: integrating theme analysis and mathematical programming approaches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170711 |
|
RJ01 | Rejection of invention patent application after publication |