CN106971011A - A kind of big data analysis method based on cloud platform - Google Patents
A kind of big data analysis method based on cloud platform Download PDFInfo
- Publication number
- CN106971011A CN106971011A CN201710356074.2A CN201710356074A CN106971011A CN 106971011 A CN106971011 A CN 106971011A CN 201710356074 A CN201710356074 A CN 201710356074A CN 106971011 A CN106971011 A CN 106971011A
- Authority
- CN
- China
- Prior art keywords
- data
- analysis
- framework
- big data
- big
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of big data analysis method based on cloud platform, methods described includes:Determine data analysis target and plan;According to the data analysis target of determination and plan, the analysis framework of the big data based on cloud platform is created;Big data to be analyzed is obtained, and carries out data preparation and processing;Data filtering is carried out to data, complete and unduplicated data are obtained;Data are clustered, and to data analysis;Result is tested, verified, assessed and disposed.Using the embodiment of the present invention, accuracy, promptness and the flexibility of big data analysis are improved.
Description
Technical field
The present invention relates to big data analysis technical field, more particularly to a kind of big data analysis method based on cloud platform.
Background technology
With society's industrialization, the continuous improvement of the level of IT application, nowadays data, which have replaced, is calculated as information calculating
Center, cloud computing, big data turn into a kind of trend and trend.Including memory capacity, availability, I/O performances, data peace
All many-sides such as Quan Xing, scalability.Big data is the very huge and complicated data set of scale.Big data has 4V:Volume
(a large amount of), data volume increases continuously and healthily;Velocity (high speed), data I/O speed are faster;Variety (various), data
The types and sources variation;Value (value), there is the usable value of each side in it.Due to including the letter of magnanimity in big data
Breath, available data resource in magnanimity information carries out distributed big data analysis and excavation is most preferably mode.However,
Distributed data system of the prior art and associated database can not be satisfied with growing data volume and analysis is dug
Pick demand, and data-handling efficiency is not high enough, respond it is not prompt enough because its can not effectively obtain, store, managing,
Excavate and analyze the data of this feature, it is difficult to embody the accuracy, promptness and flexibility of data processing.
Therefore, in order to meeting the challenge in big data epoch, the accuracy of big data analysis, promptness and flexibly are improved
Property, particularly improve precision of analysis, promptness and flexibility and improve its quality, can there is a need in the art for one kind
Effectively solve the big data information analysis method of above-mentioned technical problem.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of big data analysis method based on cloud platform, improves big data and divide
Accuracy, promptness and the flexibility of analysis.
To reach above-mentioned purpose, the embodiment of the invention discloses a kind of big data analysis method based on cloud platform, method
Including:
Determine data analysis target and plan;
According to the data analysis target of determination and plan, the analysis framework of the big data based on cloud platform is created;
Big data to be analyzed is obtained, and carries out data preparation and processing;
Data filtering is carried out to data, complete and unduplicated data are obtained;
Data are clustered, and to data analysis;
Result is tested, verified, assessed and disposed.
Optionally, wherein the different characteristic having for different pieces of information, characteristic and/or attribute are come mining analysis requirement and category
Sex object.
Optionally, the analysis framework can use central data processing framework, or distributed data processing framework.
Optionally, the analysis framework can be any form of framework of the characteristic based on big data.
Optionally, it is described to obtain big data to be analyzed, and data preparation and processing are carried out, including:
For processing data, first posting data;
Data storage;
A kind of form is converted data to, the form is the value of a pair of binary formats;
Obtain the identifier of data and corresponding description;
Every predetermined period of time is updated the data, but need to ensure to be unable to all data of posting.
Optionally, the period is set automatically according to needs or data characteristicses come artificial or machine.
Optionally, it is described that data are clustered, and to data analysis, including:
The associated data of identification;
It is determined that each pending data point;
Data volume is reduced using cluster machine learning algorithm;
Carry out analyze data collection using the cluster machine learning algorithm.
Optionally, it is described that data are clustered, and to data analysis, including:
For each pending data point, the value of a pair of binary formats is generated;
The value of a pair of binary formats further comprises cluster identifier and corresponding to the coordinate value of the data point;
For the sum of each cluster generation input;
Send the value relevant with identical cluster;
The result of cluster is stored as incoherent data.
Optionally, the machine learning algorithm is mean algorithm.
Optionally, it is described that data filtering is carried out to data, complete and unduplicated data are obtained, including:
Using Hadoop distributed modes, data filtering is carried out to data, complete and unduplicated data are obtained.
It can be seen that, using a kind of big data analysis method based on cloud platform provided in an embodiment of the present invention, determine data point
Analyse target and plan;According to the data analysis target of determination and plan, the analysis framework of the big data based on cloud platform is created;Obtain
Big data to be analyzed is obtained, and carries out data preparation and processing;Data filtering is carried out to data, complete and unduplicated number is obtained
According to;Data are clustered, and to data analysis;Result is tested, verified, assessed and disposed.Thus, it is possible to meet big
The challenge of data age, improves accuracy, promptness and the flexibility of big data analysis.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of the big data analysis method based on cloud platform provided in an embodiment of the present invention.
A kind of flow chart that Fig. 2 is step S103 in Fig. 1 provided in an embodiment of the present invention.
A kind of flow chart that Fig. 3 is step S105 in Fig. 1 provided in an embodiment of the present invention.
Another flow chart that Fig. 4 is step S105 in Fig. 1 provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
Fig. 1 is a kind of schematic flow sheet of the big data analysis method based on cloud platform provided in an embodiment of the present invention.Such as
Shown in Fig. 1, this method may include steps of:
S101, determines data analysis target and plan;
S102, according to the data analysis target of determination and plan, creates the analysis framework of the big data based on cloud platform;
S103, obtains big data to be analyzed, and carry out data preparation and processing;
Data are carried out data filtering, obtain complete and unduplicated data by S104;
Data are clustered by S105, and to data analysis;
S106, is tested result, is verified, assessed and is disposed.
Embodiments in accordance with the present invention, first, in step S101, determine data analysis target and plan.Wherein it is directed to
Different characteristic, characteristic and/or the attribute that different pieces of information has come mining analysis requirement and attributes object.Because different data tools
There are different features, characteristic and/or attribute, the big data of such as social media is based on interpersonal interaction;Military news
Big data it is implicit or concentrated the data of military issue weapons or military trend;The big data of social news reflect spin and
Including the consciousness tendency from media releasing personnel;For the big data of the technical news of some country, area or research institution
Contain its research emphasis, personnel and Financing Disposition, output efficiency, possible application scope and to research and application field
Leading action/influence, etc..For these contexts, it is desirable to have mining analysis requirement and category for different pieces of information
Sex object, so as to strengthen the specific aim of big data analysis, the accuracy of the clustering after being establishes solid foundation.
Secondly, in step S102, according to the data analysis target of determination and plan, the analysis based on big data is created
Framework.Specifically, the analysis framework can be any form of framework of the characteristic based on big data.Because different data
Take on a different character, characteristic and/or attribute, so based on this, framework targetedly can be built with reference to it.The framework can base
In arbitrary framework, such as, but not limited to:Central data processing framework, or distributed data processing framework can be used, certainly
The framework that can also be taken other form, but on condition that the characteristic based on big data.
Again, in step s 103, big data to be analyzed is obtained, and carries out data preparation and processing.Fig. 2 is this hair
The flow chart for the S103 that bright embodiment is provided.As shown in Fig. 2 obtain big data to be analyzed, and carry out data preparation and processing,
The preparation of data can provide safeguard for subsequent analysis.Specifically, it may include steps of:A1:In order to handle number
According to first posting data;A2:Data storage;A3:A kind of form is converted data to, the form is the value of a pair of binary formats;
A4:Obtain the identifier of data and corresponding description;A5:Every predetermined period of time is updated the data, but need to ensure to be unable to posting
All data, the period can as needed or data characteristicses carry out artificial or machine and set automatically.Pass through above-mentioned steps, number
It is that accurate analysis is prepared according to being able to carry out preliminary treatment.
Again, in step S104, it is possible to use Hadoop distributed modes, data filtering is carried out to data, obtained
Whole and unduplicated data.
Wherein, Hadoop is a distributed system architecture developed by Apache funds club.User can be
In the case of not knowing about distributed low-level details, distributed program is developed.Make full use of cluster power carry out high-speed computation and
Storage.
Hadoop realizes a distributed file system(Hadoop Distributed File System), referred to as
HDFS.The characteristics of HDFS has high fault tolerance, and be designed to be deployed in cheap(low-cost)On hardware;And it is provided
High-throughput(high throughput)Carry out the data of access application, being adapted to those has super large data set(large
data set)Application program.HDFS is relaxed(relax)POSIX requirement, can be accessed in the form of streaming(streaming
access)Data in file system.
The design that Hadoop framework is most crucial is exactly:HDFS and MapReduce.HDFS is provided for the data of magnanimity and deposited
Storage, then MapReduce provides calculating for the data of magnanimity.
Again, in step S105, data are clustered, and to data analysis.Embodiments in accordance with the present invention, Fig. 3
For a kind of S105 flow chart provided in an embodiment of the present invention, it is illustrated that the flow chart for being clustered and being analyzed to data.Tool
For body, it may include steps of:B1:The associated data of identification;B2:It is determined that each pending data point;B3:Use
Machine learning algorithm is clustered to reduce data volume;B4:Carry out analyze data collection using the cluster machine learning algorithm.
Also, Fig. 4 is another S105 provided in an embodiment of the present invention flow chart.As shown in figure 4, described enter to data
Row cluster, and to data analysis, may include steps of:B1:The associated data of identification;B2:It is determined that each pending
Data point;B3:Data volume is reduced using cluster machine learning algorithm;B4:Number is analyzed using the cluster machine learning algorithm
According to collection;B5:For each pending data point, the value of a pair of binary formats is generated;B6:The value of a pair of binary formats is entered
One step includes cluster identifier and corresponding to the coordinate value of the data point;B7:For the sum of each cluster generation input;B8:Hair
Send the value relevant with identical cluster;B9:The result of cluster is stored as incoherent data.By above-mentioned steps, based on big
The data that data are obtained are analyzed in detail, so as to drastically increase the accuracy of big data analysis.Preferably, in step
In B3 and B4, machine learning algorithm for example can be mean algorithm.
Finally, in step s 106, result tested, verified, assessed and disposed.Specifically, in step S106
In, the mode tested result, verified, assessed and disposed be it is arbitrary, can using it is existing and develop later it is various
Mode.
It can be seen that, handled more than, the information analysis method of big data can meet the challenge in big data epoch completely, carry
Accuracy, promptness and the flexibility of tall and big data analysis.
It should be noted that herein, all relational terms according to first and second or the like are used merely to one
Entity or operation make a distinction with another entity or operation, and not necessarily require or imply between these entities or operation
There is any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to contain
Lid nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those
Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for device
Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is
To instruct the hardware of correlation to complete by program, described program can be stored in computer read/write memory medium,
The storage medium designated herein obtained, according to:ROM/RAM, magnetic disc, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of big data analysis method based on cloud platform, it is characterised in that methods described includes:
Determine data analysis target and plan;
According to the data analysis target of determination and plan, the analysis framework of the big data based on cloud platform is created;
Big data to be analyzed is obtained, and carries out data preparation and processing;
Data filtering is carried out to data, complete and unduplicated data are obtained;
Data are clustered, and to data analysis;
Result is tested, verified, assessed and disposed.
2. according to the method described in claim 1, it is characterised in that different characteristic, the characteristic wherein having for different pieces of information
And/or attribute comes mining analysis requirement and attributes object.
3. method according to claim 2, it is characterised in that the analysis framework can use central data processing framework,
Or distributed data processing framework.
4. method according to claim 2, it is characterised in that the analysis framework can be the characteristic based on big data
Any form of framework.
5. the method according to claim any one of 1-4, it is characterised in that acquisition big data to be analyzed, goes forward side by side
Row data prepare and handled, including:
For processing data, first posting data;
Data storage;
A kind of form is converted data to, the form is the value of a pair of binary formats;
Obtain the identifier of data and corresponding description;
Every predetermined period of time is updated the data, but need to ensure to be unable to all data of posting.
6. method according to claim 5, it is characterised in that the period is according to needs or data characteristicses messenger
What work or machine were set automatically.
7. the method according to any one of claim 1-4, it is characterised in that described that data are clustered, and logarithm
According to analysis, including:
The associated data of identification;
It is determined that each pending data point;
Data volume is reduced using cluster machine learning algorithm;
Carry out analyze data collection using the cluster machine learning algorithm.
8. method according to claim 7, it is characterised in that described that data are clustered, and to data analysis, bag
Include:
For each pending data point, the value of a pair of binary formats is generated;
The value of a pair of binary formats further comprises cluster identifier and corresponding to the coordinate value of the data point;
For the sum of each cluster generation input;
Send the value relevant with identical cluster;
The result of cluster is stored as incoherent data.
9. the method according to any one of claim 7 or 8, it is characterised in that the machine learning algorithm is that average is calculated
Method.
10. the method according to claim any one of 1-9, it is characterised in that described to carry out data filtering to data, is obtained
Complete and unduplicated data, including:
Using Hadoop distributed modes, data filtering is carried out to data, complete and unduplicated data are obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710356074.2A CN106971011A (en) | 2017-05-19 | 2017-05-19 | A kind of big data analysis method based on cloud platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710356074.2A CN106971011A (en) | 2017-05-19 | 2017-05-19 | A kind of big data analysis method based on cloud platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106971011A true CN106971011A (en) | 2017-07-21 |
Family
ID=59325805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710356074.2A Withdrawn CN106971011A (en) | 2017-05-19 | 2017-05-19 | A kind of big data analysis method based on cloud platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106971011A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107741879A (en) * | 2017-10-19 | 2018-02-27 | 郑州云海信息技术有限公司 | A kind of big data processing method and its device |
CN108038228A (en) * | 2017-12-25 | 2018-05-15 | 佛山市车品匠汽车用品有限公司 | A kind of method for digging and device based on database |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104320460A (en) * | 2014-10-24 | 2015-01-28 | 西安未来国际信息股份有限公司 | Big data processing method |
CN105260448A (en) * | 2015-10-10 | 2016-01-20 | 成都博元时代软件有限公司 | Big data information analysis method |
CN106202192A (en) * | 2016-06-28 | 2016-12-07 | 浪潮软件集团有限公司 | Workflow-based big data analysis method |
CN106339439A (en) * | 2016-08-22 | 2017-01-18 | 成都众易通科技有限公司 | Big data analysis method |
-
2017
- 2017-05-19 CN CN201710356074.2A patent/CN106971011A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104320460A (en) * | 2014-10-24 | 2015-01-28 | 西安未来国际信息股份有限公司 | Big data processing method |
CN105260448A (en) * | 2015-10-10 | 2016-01-20 | 成都博元时代软件有限公司 | Big data information analysis method |
CN106202192A (en) * | 2016-06-28 | 2016-12-07 | 浪潮软件集团有限公司 | Workflow-based big data analysis method |
CN106339439A (en) * | 2016-08-22 | 2017-01-18 | 成都众易通科技有限公司 | Big data analysis method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107741879A (en) * | 2017-10-19 | 2018-02-27 | 郑州云海信息技术有限公司 | A kind of big data processing method and its device |
CN108038228A (en) * | 2017-12-25 | 2018-05-15 | 佛山市车品匠汽车用品有限公司 | A kind of method for digging and device based on database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363449B (en) | Risk identification method, device and system | |
CN102591917B (en) | Data processing method and system and related device | |
CN106709012A (en) | Method and device for analyzing big data | |
CN113157448A (en) | System and method for managing feature processing | |
CN105843841A (en) | Small file storing method and system | |
US20150100596A1 (en) | System and method for performing set operations with defined sketch accuracy distribution | |
US10268749B1 (en) | Clustering sparse high dimensional data using sketches | |
CN107577724A (en) | A kind of big data processing method | |
CN107748752A (en) | A kind of data processing method and device | |
CN112765468A (en) | Personalized user service customization method and device | |
CN106971011A (en) | A kind of big data analysis method based on cloud platform | |
Chen | Higher mathematics teaching resource scheduling system based on cloud computing | |
CN107871055A (en) | A kind of data analysing method and device | |
US11783221B2 (en) | Data exposure for transparency in artificial intelligence | |
Gupta et al. | Feature selection: an overview | |
Pranav et al. | Data mining in cloud computing | |
CN108256694A (en) | Based on Fuzzy time sequence forecasting system, the method and device for repeating genetic algorithm | |
CN112613562B (en) | Data analysis system and method based on multi-center cloud computing | |
CN109032940A (en) | A kind of test scene input method, device, equipment and storage medium | |
Ma | The Research of Stock Predictive Model based on the Combination of CART and DBSCAN | |
CN108268620A (en) | A kind of Document Classification Method based on hadoop data minings | |
Zhang et al. | Self‐Adaptive K‐Means Based on a Covering Algorithm | |
CN106528872B (en) | A kind of data search method under big data environment | |
CN108090182B (en) | A kind of distributed index method and system of extensive high dimensional data | |
Han et al. | Research on data mining and visualization technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170721 |
|
WW01 | Invention patent application withdrawn after publication |