CN105468676A - Big data processing method - Google Patents

Big data processing method Download PDF

Info

Publication number
CN105468676A
CN105468676A CN201510780656.4A CN201510780656A CN105468676A CN 105468676 A CN105468676 A CN 105468676A CN 201510780656 A CN201510780656 A CN 201510780656A CN 105468676 A CN105468676 A CN 105468676A
Authority
CN
China
Prior art keywords
data
calculation result
memory node
resource
clouds
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510780656.4A
Other languages
Chinese (zh)
Inventor
毛力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU DINGZHIHUI SCIENCE AND TECHNOLOGY CO., LTD.
Original Assignee
SICHUAN JIUCHENG INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SICHUAN JIUCHENG INFORMATION TECHNOLOGY Co Ltd filed Critical SICHUAN JIUCHENG INFORMATION TECHNOLOGY Co Ltd
Priority to CN201510780656.4A priority Critical patent/CN105468676A/en
Publication of CN105468676A publication Critical patent/CN105468676A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a big data processing method. The method comprises the following steps of obtaining big data resources related to a resource request message from a cloud according to the received resource request message which is input by a user; downloading the obtained big data resources from the cloud; and carrying out classification and data storage on the downloaded big data resources. According to the method, the user can obtain the needed big data resources timely and effectively; and effective data analysis and processing can be carried out.

Description

A kind of large data processing method
Technical field
The present invention relates to large data fields, be specifically related to a kind of large data processing method.
Background technology
In recent years, along with fast development and the popularization and application of computing machine and infotech, the scale of sector application system expanded rapidly, and the data that sector application produces are explosive increase.Easily the even tens of large data of industry/enterprise to hundreds of PB scale of hundreds of TB are reached far beyond the processing power of existing traditional computing technique and infosystem, therefore, seek effective large data processing technique, active demand that ways and means has become a reality the world.There is existing time in the industry such as the fields such as physics, biology, Environmental ecology and military affairs, finance, communication in " large data ", but because in recent years internet and information industry development and cause people to pay close attention to.
The object of large data processing is to allow user can in time, effectively obtain required large data resource.Have a large amount of demands under the scenes such as the cloud computing in internet, Distributed Calculation, but prior art also lacks effective large data processing method.
Summary of the invention
The object of the present invention is to provide a kind of large data processing method, user can be enable in time, effectively to obtain required large data resource; And can effective data analysis and process be carried out.
Object of the present invention is achieved through the following technical solutions:
A kind of large data processing method, is characterized in that, comprise the following steps:
Step 1: the resource request information receiving user's input;
Step 2: according to described resource request information, obtains the large data resource relevant to described resource request information from high in the clouds;
Step 3: user downloads the large data resource obtained from high in the clouds;
Step 4: the described large data resource downloaded is classified;
Step 5: sorted large data resource is stored.
Alternatively, described step 2 comprises the following steps:
Step 2.1: obtain described resource request information from high in the clouds by management of computing node;
Step 2.2: described management of computing node specifies multiple distributed computational nodes to carry out Distributed Calculation according to described resource request information, makes each distributed computational nodes each self-generating local calculation result;
Step 2.3: the local calculation result of each distributed computational nodes is integrated by described management of computing node, obtains a global calculation result, and described global calculation result is sent to high in the clouds.
Alternatively, described step 2.3 comprises the following steps:
Step 2.3.1: described management of computing node is according to the comprehensive grading value K of described multiple distributed computational nodes, the respective local calculation result of each distributed computational nodes is sorted, and remove repeating data and noise data after being merged by ranking results, obtain global calculation result;
Wherein, for each distributed computational nodes, if its comprehensive grading value is K, degree of belief score value is K1, and computing power score value is K2; Then: K=(A+ (K1) 1/2) * (B+ (K2) 1/2);
Wherein, A, B are positive integers, and K1, K2 are greater than zero;
Step 2.3.2: described global calculation result, according to fixed time interval, is sent to high in the clouds in the mode of incremental data by described management of computing node.
Alternatively, described step 3 comprises the following steps:
Step 3.1: the described global calculation result obtained from high in the clouds is divided into several independently data blocks by data transmitting server, and record the capacity of each data block, described data block is stored into successively in chronological order in a data memory node set simultaneously, described data memory node set comprises M independently data memory node, that is: data memory node 1, data memory node 2 ..., data memory node N ..., data memory node M;
Step 3.2: after current data block is stored into data memory node N by described data transmitting server, data memory node N returns its residual capacity information to data transmitting server, when the residual capacity information of data memory node N is less than the capacity of next data block, forwarding server starts to store data block to data memory node N+1; The like, until global calculation result has all stored rear end; Wherein, N≤M, and M, N are positive integer;
Step 3.3: user will download described global calculation result from the data transmitting server in high in the clouds, described global calculation result is obtained large data resource.
Alternatively, described step 4 comprises the following steps:
Step 4.1: stochastic sampling is carried out to the attribute of the described large data resource downloaded, obtains multiple large class data set;
Step 4.2: stochastic sampling is carried out to the attribute of each large class data set, obtains multiple group data set;
Step 4.3: carry out cluster analysis to each large class data set, obtains multiple large class cluster result and corresponding large class label;
Step 4.4: carry out cluster analysis to each group data set, obtains multiple group cluster result and corresponding group label;
Step 4.5: export described large class cluster result and large class label, group cluster result and group label, complete the classification of described large data resource.
Beneficial effect of the present invention is: by calculating with process the distributed storage of large data resource, improve the counting yield of large data processing, cost is low, and it is good that data store continuity, and security is high.
Embodiment
Below in conjunction with embodiment, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
A kind of large data processing method, is characterized in that, comprise the following steps:
Step 1: the resource request information receiving user's input;
Step 2: according to described resource request information, obtains the large data resource relevant to described resource request information from high in the clouds;
Step 3: user downloads the large data resource obtained from high in the clouds;
Step 4: the described large data resource downloaded is classified;
Step 5: sorted large data resource is stored.
Alternatively, described step 2 comprises the following steps:
Step 2.1: obtain described resource request information from high in the clouds by management of computing node;
Step 2.2: described management of computing node specifies multiple distributed computational nodes to carry out Distributed Calculation according to described resource request information, makes each distributed computational nodes each self-generating local calculation result;
Step 2.3: the local calculation result of each distributed computational nodes is integrated by described management of computing node, obtains a global calculation result, and described global calculation result is sent to high in the clouds.
Alternatively, described step 2.3 comprises the following steps:
Step 2.3.1: described management of computing node is according to the comprehensive grading value K of described multiple distributed computational nodes, the respective local calculation result of each distributed computational nodes is sorted, and remove repeating data and noise data after being merged by ranking results, obtain global calculation result;
Wherein, for each distributed computational nodes, if its comprehensive grading value is K, degree of belief score value is K1, and computing power score value is K2; Then: K=(A+ (K1) 1/2) * (B+ (K2) 1/2);
Above in several parameter: the span of A, B is positive integer; K1, K2 are positive number;
Wherein, the degree of belief score value of each distributed computational nodes is K1 and computing power score value is that K2 can be known; To be K1 with the data history of this distributed computational nodes access described degree of belief score value that situation is relevant, and computing power score value is K2 with this computing power score value is that the hardware computing power of K2 is relevant;
Described parameter A, B are regulating parameter, can be constant, also can carry out necessary adjustment according to actual needs.
Step 2.3.2: described global calculation result, according to fixed time interval, is sent to high in the clouds in the mode of incremental data by described management of computing node.
Alternatively, described step 3 comprises the following steps:
Step 3.1: the described global calculation result obtained from high in the clouds is divided into several independently data blocks by data transmitting server, and record the capacity of each data block, described data block is stored into successively in chronological order in a data memory node set simultaneously, described data memory node set comprises M independently data memory node, that is: data memory node 1, data memory node 2 ..., data memory node N ..., data memory node M;
Step 3.2: after current data block is stored into data memory node N by described data transmitting server, data memory node N returns its residual capacity information to data transmitting server, when the residual capacity information of data memory node N is less than the capacity of next data block, forwarding server starts to store data block to data memory node N+1; The like, until global calculation result has all stored rear end; Wherein, N≤M, and M, N are positive integer;
Step 3.3: user will download described global calculation result from the data transmitting server in high in the clouds, described global calculation result is obtained large data resource.
Alternatively, described step 4 comprises the following steps:
Step 4.1: stochastic sampling is carried out to the attribute of the described large data resource downloaded, obtains multiple large class data set;
Step 4.2: stochastic sampling is carried out to the attribute of each large class data set, obtains multiple group data set;
Step 4.3: carry out cluster analysis to each large class data set, obtains multiple large class cluster result and corresponding large class label;
Step 4.4: carry out cluster analysis to each group data set, obtains multiple group cluster result and corresponding group label;
Step 4.5: export described large class cluster result and large class label, group cluster result and group label, complete the classification of described large data resource.
Although above detailed description illustrates, describe and point out to be applied to the of the present disclosure basic novel feature of multiple realization, but will be appreciated that, those skilled in the art under the prerequisite not departing from intention of the present disclosure, can make multiple omission, replacement and change in the form and details of system.In addition, the order of the order that occurs in the claims of method step not ways of hinting step.

Claims (5)

1. a large data processing method, is characterized in that, comprises the following steps:
Step 1: the resource request information receiving user's input;
Step 2: according to described resource request information, obtains the large data resource relevant to described resource request information from high in the clouds;
Step 3: user downloads the large data resource obtained from high in the clouds;
Step 4: the described large data resource downloaded is classified;
Step 5: sorted large data resource is stored.
2. large data processing method according to claim 1, is characterized in that, described step 2 comprises the following steps:
Step 2.1: obtain described resource request information from high in the clouds by management of computing node;
Step 2.2: described management of computing node specifies multiple distributed computational nodes to carry out Distributed Calculation according to described resource request information, makes each distributed computational nodes each self-generating local calculation result;
Step 2.3: the local calculation result of each distributed computational nodes is integrated by described management of computing node, obtains a global calculation result, and described global calculation result is sent to high in the clouds.
3. large data processing method according to claim 2, is characterized in that, described step 2.3 comprises the following steps:
Step 2.3.1: described management of computing node is according to the comprehensive grading value K of described multiple distributed computational nodes, the respective local calculation result of each distributed computational nodes is sorted, and remove repeating data and noise data after being merged by ranking results, obtain global calculation result;
Wherein, for each distributed computational nodes, if its comprehensive grading value is K, degree of belief score value is K1, and computing power score value is K2; Then: K=(A+ (K1) 1/2) * (B+ (K2) 1/2);
Wherein, A, B are positive integers, and K1, K2 are greater than zero;
Step 2.3.2: described global calculation result, according to fixed time interval, is sent to high in the clouds in the mode of incremental data by described management of computing node.
4. large data processing method according to claim 3, is characterized in that, described step 3 comprises the following steps:
Step 3.1: the described global calculation result obtained from high in the clouds is divided into several independently data blocks by data transmitting server, and record the capacity of each data block, described data block is stored into successively in chronological order in a data memory node set simultaneously, described data memory node set comprises M independently data memory node, that is: data memory node 1, data memory node 2 ..., data memory node N ..., data memory node M;
Step 3.2: after current data block is stored into data memory node N by described data transmitting server, data memory node N returns its residual capacity information to data transmitting server, when the residual capacity information of data memory node N is less than the capacity of next data block, forwarding server starts to store data block to data memory node N+1; The like, until global calculation result has all stored rear end; Wherein, N≤M, and M, N are positive integer;
Step 3.3: user will download described global calculation result from the data transmitting server in high in the clouds, described global calculation result is obtained large data resource.
5. large data processing method according to claim 4, is characterized in that, described step 4 comprises the following steps:
Step 4.1: stochastic sampling is carried out to the attribute of the described large data resource downloaded, obtains multiple large class data set;
Step 4.2: stochastic sampling is carried out to the attribute of each large class data set, obtains multiple group data set;
Step 4.3: carry out cluster analysis to each large class data set, obtains multiple large class cluster result and corresponding large class label;
Step 4.4: carry out cluster analysis to each group data set, obtains multiple group cluster result and corresponding group label;
Step 4.5: export described large class cluster result and large class label, group cluster result and group label, complete the classification of described large data resource.
CN201510780656.4A 2015-11-13 2015-11-13 Big data processing method Pending CN105468676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510780656.4A CN105468676A (en) 2015-11-13 2015-11-13 Big data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510780656.4A CN105468676A (en) 2015-11-13 2015-11-13 Big data processing method

Publications (1)

Publication Number Publication Date
CN105468676A true CN105468676A (en) 2016-04-06

Family

ID=55606377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510780656.4A Pending CN105468676A (en) 2015-11-13 2015-11-13 Big data processing method

Country Status (1)

Country Link
CN (1) CN105468676A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534247A (en) * 2016-09-18 2017-03-22 东软集团股份有限公司 Method and device for downloading forms
CN107257292A (en) * 2017-05-26 2017-10-17 河南职业技术学院 A kind of cross-domain distributed big data communication system design planning method
CN107943808A (en) * 2016-10-13 2018-04-20 北京京东尚科信息技术有限公司 The method and apparatus of processing equipment reported data
CN108491456A (en) * 2018-03-02 2018-09-04 西安财经学院 The processing method of purchase information is sold in a kind of insurance service based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279505A (en) * 2013-05-10 2013-09-04 中国南方电网有限责任公司超高压输电公司 Mass data processing method based on semantic meaning
US9031992B1 (en) * 2011-09-30 2015-05-12 Emc Corporation Analyzing big data
CN104881581A (en) * 2015-05-28 2015-09-02 成都艺辰德迅科技有限公司 IoT (Internet of Things) data high-efficiency analysis method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9031992B1 (en) * 2011-09-30 2015-05-12 Emc Corporation Analyzing big data
CN103279505A (en) * 2013-05-10 2013-09-04 中国南方电网有限责任公司超高压输电公司 Mass data processing method based on semantic meaning
CN104881581A (en) * 2015-05-28 2015-09-02 成都艺辰德迅科技有限公司 IoT (Internet of Things) data high-efficiency analysis method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
卿华: "软件管理可以很简单", 《电脑迷》 *
林伟伟等: "《分布式计算、云计算与大数据》", 31 October 2015, 机械工业出版社 *
林树地: "基于Hadoop的决策树分类算法研究", 《中国优秀硕士学位论文全文数据库》 *
熊安萍: "基于对象存储的负载均衡存储策略", 《计算机工程与设计》 *
陈健美等: "《数字图像处理与分析》", 1 March 2015, 江苏大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534247A (en) * 2016-09-18 2017-03-22 东软集团股份有限公司 Method and device for downloading forms
CN107943808A (en) * 2016-10-13 2018-04-20 北京京东尚科信息技术有限公司 The method and apparatus of processing equipment reported data
CN107257292A (en) * 2017-05-26 2017-10-17 河南职业技术学院 A kind of cross-domain distributed big data communication system design planning method
CN107257292B (en) * 2017-05-26 2019-11-19 河南职业技术学院 A kind of cross-domain distributed big data communication system design planning method
CN108491456A (en) * 2018-03-02 2018-09-04 西安财经学院 The processing method of purchase information is sold in a kind of insurance service based on big data

Similar Documents

Publication Publication Date Title
US9152691B2 (en) System and method for performing set operations with defined sketch accuracy distribution
US20180302297A1 (en) Methods and systems for controlling data backup
KR20190075962A (en) Data processing method and data processing apparatus
CN105468676A (en) Big data processing method
CN106815254A (en) A kind of data processing method and device
CN112035549A (en) Data mining method and device, computer equipment and storage medium
CN104615765A (en) Data processing method and data processing device for browsing internet records of mobile subscribers
CN111258978A (en) Data storage method
CN105335368A (en) Product clustering method and apparatus
Ye et al. Big data processing framework for manufacturing
Chen et al. An intelligent approval system for city construction based on cloud computing and big data
CN110427574B (en) Route similarity determination method, device, equipment and medium
CN108833592A (en) Cloud host schedules device optimization method, device, equipment and storage medium
CN103984723A (en) Method used for updating data mining for frequent item by incremental data
CN112182111B (en) Block chain based distributed system layered processing method and electronic equipment
CN113361618A (en) Industrial data joint modeling method and system based on federal learning
CN105138684A (en) Information processing method and device
Gunawardena et al. Real-time Uber data analysis of popular Uber locations in Kubernetes environment
CN116723090A (en) Alarm root cause positioning method and device, electronic equipment and readable storage medium
CN113362090A (en) User behavior data processing method and device
CN109582476A (en) Data processing method, apparatus and system
CN111143456B (en) Spark-based Cassandra data import method, device, equipment and medium
CN110929207B (en) Data processing method, device and computer readable storage medium
Wen et al. Challenges and Opportunities of Building Fast GBDT Systems.
CN105224998A (en) Data processing method and device for pre-estimation model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Xu Chi

Inventor before: Mao Li

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170801

Address after: 1, No. 772, No. 7, 610041 floor, No. 1388, Tianfu Road, Chengdu hi tech Zone, Sichuan

Applicant after: CHENGDU DINGZHIHUI SCIENCE AND TECHNOLOGY CO., LTD.

Address before: 610041 A, building, No. two, Science Park, high tech Zone, Sichuan, Chengdu, China 103B

Applicant before: Sichuan Jiucheng Information Technology Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160406