CN107301094A - The dynamic self-adapting data model inquired about towards extensive dynamic transaction - Google Patents

The dynamic self-adapting data model inquired about towards extensive dynamic transaction Download PDF

Info

Publication number
CN107301094A
CN107301094A CN201710325734.0A CN201710325734A CN107301094A CN 107301094 A CN107301094 A CN 107301094A CN 201710325734 A CN201710325734 A CN 201710325734A CN 107301094 A CN107301094 A CN 107301094A
Authority
CN
China
Prior art keywords
data
workload
processing
dynamic
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710325734.0A
Other languages
Chinese (zh)
Inventor
郭蒙雨
康宏
袁晓洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN201710325734.0A priority Critical patent/CN107301094A/en
Publication of CN107301094A publication Critical patent/CN107301094A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1443Transmit or communication errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to dynamic self-adapting data model construction method when being inquired about towards extensive dynamic transaction, comprise the following steps:Data are collected in real time from the data sources such as console, RPC, text, tail, log system, exec;When high-throughput, the speed of data acquisition and data processing in regulation real-time scene, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;Each data library inquiry request in workload is handled, effective partition information is extracted, obtains real-time data model;The data in workload are persistently handled, the quantity of processing unit can dynamically be adjusted according to the scale of workload, and parallel processing can be achieved in multiple processing units;Distributed file system is write the result into, MySQL database is stored in.Present invention uses streaming framework, the reasonable distribution resource in distributed type assemblies is improved on robustness.

Description

The dynamic self-adapting data model inquired about towards extensive dynamic transaction
Technical field
The present invention relates to the dynamic self-adapting data model construction method inquired about towards extensive dynamic transaction, more particularly to The dynamic self-adapting data model constructing system inquired about towards extensive dynamic transaction.
Background technology
,, should between user and application along with quickly generating for mass data towards under the cloud computing environment of big data It is more and more frequent with interacting between application.User's request shows the characteristics of personalization, real time implementation.Therefore, large-scale OLAP (On-Line Analytical Processing) and OLTP (On-Line Transaction Processing) application need Workload is handled immediately.
The content of the invention
The technical problems to be solved by the invention are the dynamic self-adapting data models inquired about towards extensive dynamic transaction Method and the system realization based on Storm streaming frameworks.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:The dynamic inquired about towards extensive dynamic transaction is adaptive Data model construction method is answered, is comprised the following steps:
Step 1:Data are collected in real time from the data sources such as console, RPC, text, tail, log system, exec;
Step 2:When high-throughput, the speed of data acquisition and data processing, drop in regulation real-time scene Low system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
Step 3:Each data library inquiry request in workload is handled, effective subregion letter is extracted Breath, obtains real-time data model;
Step 4:The data in workload are persistently handled, the quantity of processing unit can be dynamic according to the scale of workload State is adjusted, multiple processing units, and parallel processing can be achieved;
Step 5:Distributed file system is write the result into, MySQL database is stored in.
The beneficial effects of the invention are as follows:Propose the moving towards the inquiry of extensive dynamic transaction being combined with streaming framework State self-adapting data model building method, is expanded by building incidence matrix map sub-region information, and using the level of streaming framework Exhibition mechanism realizes high scalability and high-throughput adaptability.Test result indicates that the algorithm is for big rule under big data environment Mould, Dynamic workload carry out the effective means of real time data subregion.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, square is associated calculating Battle array M each attribute pair between the degree of association when, the calculating of every a line is assigned in the different computing units of streaming framework simultaneously Perform, then all intermediate results are added and obtain final result together.
It is that time complexity has been reduced to O (1) using the beneficial effect of above-mentioned further scheme, so as to improve data partition The execution efficiency of algorithm.
Further, dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, including data AM access module, handling capacity adjustment module, data processing module, horizontal extension module and data memory module;
The data access module, collection stream data and adaptation high-throughput.From console, RPC, text, tail, Data are collected in real time in the data sources such as log system, exec, and real time data is provided for the further processing of streaming framework;
The handling capacity adjustment module, in big data streaming computing environment, acquisition speed and data processing speed Not necessarily synchronous, when high-throughput, handling capacity adjustment module can adjust data acquisition and number in real-time scene According to the speed of processing, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
The data processing module, is handled each data library inquiry request in workload, and obtain reality When data model, the workload of input is pre-processed, effective partition information is extracted;There are multiple processing units, Parallel processing can be achieved, time complexity is reduced;
In the case of the horizontal extension module, big data, data scale has exceeded the disposal ability of unit, in face of extensive Load, horizontal extension module can neatly carry out horizontal extension by increasing processing unit, increase algorithm degree of parallelism, reduction Algorithm complex;
The data memory module, by division result persistence, distributed file system is write by division result, is stored in MySQL database, according to these real-time results, is calculated for further studying.
Using the beneficial effect of above-mentioned further scheme solved under big data environment, towards extensive, dynamic, unknown Workload carries out the timeliness sex chromosome mosaicism of data modeling, it is necessary to which data model constructing technology is combined with streaming computing framework, Propose a set of data model constructing plan and related system based on streaming framework.
Further, dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, its feature exists In:
1) dynamic self-adapting data model is built:Partitioning strategies generate with dynamic update module, each data processing it Enter Mobile state renewal to partitioning strategies afterwards;
2) fault-tolerant management:Using the fault-tolerant verification scheme of streaming framework, realize that fault-tolerant management is real for example with Kafka These flow datas, when mistake occurs in data handling procedure, are preserved a period of time by existing data playback in systems, in order to from Some point starts to re-start transmission;
3) reliability:Data access module dynamically crawl data, and being adjusted by handling capacity, it is ensured that in the case of high-throughput The stability of system processing.Handling capacity adjustment module realizes the processing to unknown data by dispatching adaptation and load balancing, Mobile state adjustment can be entered to data model with the change of workload;
4) horizontal extension:Horizontal extension module growth data processing unit when in face of extensive, dynamic load, realizes system The high scalability and high availability of system.
Brief description of the drawings
Fig. 1 is the inventive method flow chart of steps;
Fig. 2 is apparatus of the present invention structure chart.
Description of reference numerals:1-data access module;2-handling capacity adjustment module;3-data processing module;4-water Flat expansion module;5-data memory module.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.
As shown in figure 1, being the inventive method flow chart of steps;Fig. 2 is apparatus of the present invention structure chart.
Embodiment 1
Dynamic self-adapting data model construction method when being inquired about towards extensive dynamic transaction, comprises the following steps:
Step 1:The collection of data is realized with Flume.Flume is a distribution of Cloudera offers, reliable and height The data gathering system of available massive logs collection, polymerization and transmission, it can be from continuous collecting number in different data sources According to.A Data Generator is built, journal file is generated in real time, data acquisition is carried out using journal file as data source;
Step 2:Kafka is directed to the situation of high-throughput in real-time scene, and high-throughput is carried out as middleware Regulation, is adapted to the dynamic change of load;
Step 3:Load pretreatment is carried out, partitioning algorithm is run, real time partitioned scheme is obtained.When data processing is realized, Storm provides API, only need to customize Spout and Bolt function, and provide data flow between each Bolt Flow direction, just can realize the real-time calculating of convection type big data by the execution of data flow operation;
The step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, calculating each of incidence matrix M During the degree of association between attribute pair, the calculating of every a line is assigned in the different computing units of streaming framework and performed simultaneously, then All intermediate results are added together and final result is obtained.
This stage extracts the partition information in workload, carries out statistics calculating.The input in this stage is step 1 In extensive, dynamic, unknown workload, the characteristic that streaming framework is handled in real time ensure that unknown flow data can be located in time Reason, an incidence matrix for including partition information can be obtained through load mapping.
Step 4:Calculating task in Storm can parallel be carried out between multiple threads, process and server.In addition, Zookeeper provides distributed coordination service, can neatly carry out horizontal extension by adding physical node.
When mass data has access to next, multiple processes can be opened on a machine, multiple physics can also be added Node increases the quantity of processing unit, and the degree of parallelism of increase system processing realizes horizontal extension, reduce processing time;
Step 5:Data memory module is realized using MySQL database, MySQL interface is realized in Storm, will be divided Area's result is saved in MySQL database, realizes data storage.
Dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, including data access module 1, handling capacity adjustment module 2, data processing module 3, horizontal extension module 4 and data memory module 5;
The data access module (1), collection stream data and adaptation high-throughput.From console, RPC, text, Data are collected in real time in the data sources such as tail, log system, exec, and real-time number is provided for the further processing of streaming framework According to;
The handling capacity adjustment module (2), in big data streaming computing environment, acquisition speed and data processing speed Degree is not necessarily synchronous, when high-throughput, handling capacity adjustment module can adjust in real-time scene data acquisition with The speed of data processing, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
The data processing module (3), is handled each data library inquiry request in workload, and obtain Real-time data model, pre-processes to the workload of input, extracts effective partition information;There are multiple processing single Member, can be achieved parallel processing, reduce time complexity;
In the case of the horizontal extension module (4), big data, data scale has exceeded the disposal ability of unit, in face of big Scale is loaded, and horizontal extension module can neatly carry out horizontal extension by increasing processing unit, increase algorithm degree of parallelism, Reduce algorithm complex;
The data memory module (5), by division result persistence, distributed file system is write by division result, is deposited Storage, according to these real-time results, is calculated in MySQL database for further studying.
The step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, calculating each of incidence matrix M During the degree of association between attribute pair, the calculating of every a line is assigned in the different computing units of streaming framework and performed simultaneously, then All intermediate results are added together and final result is obtained.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (5)

1. it is a kind of towards extensive dynamic transaction inquire about when dynamic self-adapting data model construction method, it is characterised in that bag Include following steps:
Step 1:Data are collected in real time from the data sources such as console, RPC, text, tail, log system, exec;
Step 2:When high-throughput, the speed of data acquisition and data processing in regulation real-time scene, reduction system The delay of the extensive Dynamic workload of system processing, it is ensured that the stability of system;
Step 3:Each data library inquiry request in workload is handled, effective partition information is extracted, obtains To real-time data model;
Step 4:The data in workload are persistently handled, the quantity of processing unit can dynamically be adjusted according to the scale of workload Whole, parallel processing can be achieved in multiple processing units;
Step 5:Distributed file system is write the result into, MySQL database is stored in.
2. it is according to claim 1 towards extensive dynamic transaction inquire about when dynamic self-adapting data model structure side Method, it is characterised in that:Step 3 further comprises:Dropped using the parallel computation mechanism of streaming framework, calculating incidence matrix M's During the degree of association between each attribute pair, the calculating of every a line is assigned in the different computing units of streaming framework and performed simultaneously, All intermediate results are added and obtain final result together again.
3. according in claim 1 to 2 it is any it is described towards extensive dynamic transaction inquire about when dynamic self-adapting data mould Type construction method, it is characterised in that:Dynamic increment updates;Handle unknown workload;In real time processing, using streaming framework and Row computing mechanism improves execution efficiency.Horizontal extension and high-throughput adaptability, WSPA is by algorithm process and streaming framework knot Close, the horizontal extension mechanism having using streaming framework, processing is extensive, Dynamic workload when, addition can be passed through Physical node neatly realizes horizontal extension in addition, by being combined with data access component, and such as Flume and Kafka can To realize in the case of the workload in face of high-throughput, algorithm still has good performance.
4. dynamic self-adapting data model constructing system when being inquired about towards extensive dynamic transaction, it is characterised in that:Including number According to AM access module (1), handling capacity adjustment module (2), data processing module (3), horizontal extension module (4) and data memory module (5);
The data access module (1), collection stream data and adaptation high-throughput.From console, RPC, text, tail, day Data are collected in real time in the data sources such as aspiration system, exec, and real time data is provided for the further processing of streaming framework;
The handling capacity adjustment module (2), in big data streaming computing environment, acquisition speed and data processing speed are not Certain synchronous, when high-throughput, handling capacity adjustment module can adjust data acquisition and data in real-time scene The speed of processing, reduction system handles the delay of extensive Dynamic workload, it is ensured that the stability of system;
The data processing module (3), is handled each data library inquiry request in workload, and obtain in real time Data model, the workload of input is pre-processed, effective partition information is extracted;There are multiple processing units, can Parallel processing is realized, time complexity is reduced;
In the case of the horizontal extension module (4), big data, data scale has exceeded the disposal ability of unit, in face of extensive Load, horizontal extension module can neatly carry out horizontal extension by increasing processing unit, increase algorithm degree of parallelism, reduction Algorithm complex;
The data memory module (5), by division result persistence, distributed file system is write by division result, is stored in MySQL database, according to these real-time results, is calculated for further studying.
5. it is according to claim 4 towards extensive dynamic transaction inquire about when dynamic self-adapting data model build system System, it is characterised in that:
1) dynamic self-adapting data model is built:Partitioning strategies is generated and dynamic update module, right after each data processing Partitioning strategies enters Mobile state renewal;
2) fault-tolerant management:Using the fault-tolerant verification scheme of streaming framework, realize that fault-tolerant management realizes data for example with Kafka Reset, when mistake occurs in data handling procedure, these flow datas are preserved into a period of time in systems, in order to from some point Start to re-start transmission;
3) reliability:Data access module dynamically crawl data, and being adjusted by handling capacity, it is ensured that system in the case of high-throughput The stability of processing.Handling capacity adjustment module realizes the processing to unknown data by dispatching adaptation and load balancing, can be with As Mobile state adjustment is entered in the change of workload to data model;
4) horizontal extension:Horizontal extension module growth data processing unit when in face of extensive, dynamic load, realizes system High scalability and high availability.The comfortable indicating strip of infant-wear according to claim 1, it is characterised in that the sign Color with internal layer is deeper than the color of outer layer.
CN201710325734.0A 2017-05-10 2017-05-10 The dynamic self-adapting data model inquired about towards extensive dynamic transaction Pending CN107301094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710325734.0A CN107301094A (en) 2017-05-10 2017-05-10 The dynamic self-adapting data model inquired about towards extensive dynamic transaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710325734.0A CN107301094A (en) 2017-05-10 2017-05-10 The dynamic self-adapting data model inquired about towards extensive dynamic transaction

Publications (1)

Publication Number Publication Date
CN107301094A true CN107301094A (en) 2017-10-27

Family

ID=60137069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710325734.0A Pending CN107301094A (en) 2017-05-10 2017-05-10 The dynamic self-adapting data model inquired about towards extensive dynamic transaction

Country Status (1)

Country Link
CN (1) CN107301094A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121645A (en) * 2017-12-25 2018-06-05 深圳市分期乐网络科技有限公司 A kind of daily record method for evaluating quality, device, server and storage medium
CN109271395A (en) * 2018-09-11 2019-01-25 南京轨道交通系统工程有限公司 Extensive real time data for comprehensive monitoring system updates delivery system and method
CN109327329A (en) * 2018-08-31 2019-02-12 华为技术有限公司 Data model update method and device
CN112685403A (en) * 2019-10-18 2021-04-20 上海同是科技股份有限公司 High-availability framework system for hidden danger troubleshooting data storage and implementation method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103747060A (en) * 2013-12-26 2014-04-23 惠州华阳通用电子有限公司 Distributed monitor system and method based on streaming media service cluster
CN103853844A (en) * 2014-03-24 2014-06-11 南开大学 Hadoop-based relation table nonredundant key set identification method
US20160105352A1 (en) * 2014-10-09 2016-04-14 Fujitsu Limited File system, control program of file system management device, and method of controlling file system
CN106446126A (en) * 2016-09-19 2017-02-22 哈尔滨航天恒星数据系统科技有限公司 Massive space information data storage management method and storage management device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103747060A (en) * 2013-12-26 2014-04-23 惠州华阳通用电子有限公司 Distributed monitor system and method based on streaming media service cluster
CN103853844A (en) * 2014-03-24 2014-06-11 南开大学 Hadoop-based relation table nonredundant key set identification method
US20160105352A1 (en) * 2014-10-09 2016-04-14 Fujitsu Limited File system, control program of file system management device, and method of controlling file system
CN106446126A (en) * 2016-09-19 2017-02-22 哈尔滨航天恒星数据系统科技有限公司 Massive space information data storage management method and storage management device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
康宏 等: "应用驱动的基于流式框架的实时数据分区算法", 《计算机应用研究》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121645A (en) * 2017-12-25 2018-06-05 深圳市分期乐网络科技有限公司 A kind of daily record method for evaluating quality, device, server and storage medium
CN109327329A (en) * 2018-08-31 2019-02-12 华为技术有限公司 Data model update method and device
CN109327329B (en) * 2018-08-31 2021-11-09 华为技术有限公司 Data model updating method and device
CN109271395A (en) * 2018-09-11 2019-01-25 南京轨道交通系统工程有限公司 Extensive real time data for comprehensive monitoring system updates delivery system and method
CN112685403A (en) * 2019-10-18 2021-04-20 上海同是科技股份有限公司 High-availability framework system for hidden danger troubleshooting data storage and implementation method thereof

Similar Documents

Publication Publication Date Title
Fujimoto Parallel and distributed simulation systems
CN105550323B (en) Load balance prediction method and prediction analyzer for distributed database
CN107124394B (en) Power communication network security situation prediction method and system
CN107301094A (en) The dynamic self-adapting data model inquired about towards extensive dynamic transaction
CN105117497B (en) Ocean big data principal and subordinate directory system and method based on Spark cloud network
CN111586091B (en) Edge computing gateway system for realizing computing power assembly
CN108170530B (en) Hadoop load balancing task scheduling method based on mixed element heuristic algorithm
CN106777093A (en) Skyline inquiry systems based on space time series data stream application
CN106708989A (en) Spatial time sequence data stream application-based Skyline query method
CN104156810A (en) Power dispatching production management system based on cloud computing and realization method of power dispatching production management system
CN103188346A (en) Distributed decision making supporting massive high-concurrency access I/O (Input/output) server load balancing system
CN104104621B (en) A kind of virtual network resource dynamic self-adapting adjusting method based on Nonlinear Dimension Reduction
CN103401939A (en) Load balancing method adopting mixing scheduling strategy
CN113342510B (en) Water and power basin emergency command cloud-side computing resource cooperative processing method
CN109034386A (en) A kind of deep learning system and method based on Resource Scheduler
CN110659278A (en) Graph data distributed processing system based on CPU-GPU heterogeneous architecture
CN112948123B (en) Spark-based grid hydrological model distributed computing method
CN115134371A (en) Scheduling method, system, equipment and medium containing edge network computing resources
CN105975345A (en) Video frame data dynamic equilibrium memory management method based on distributed memory
CN105205052A (en) Method and device for mining data
CN110245135A (en) A kind of extensive streaming diagram data update method based on NUMA architecture
CN106980540A (en) A kind of computational methods of distributed Multidimensional Discrete data
CN107257356B (en) Social user data optimal placement method based on hypergraph segmentation
CN109359205A (en) A kind of remote sensing image cutting method and equipment based on geographical grid
CN101436204A (en) City evolvement simulation implementing method based on paralleling elementary cell automatic machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171027

WD01 Invention patent application deemed withdrawn after publication