CN109063115A - A kind of Intelligent statistical system and method based on online big data - Google Patents

A kind of Intelligent statistical system and method based on online big data Download PDF

Info

Publication number
CN109063115A
CN109063115A CN201810852774.5A CN201810852774A CN109063115A CN 109063115 A CN109063115 A CN 109063115A CN 201810852774 A CN201810852774 A CN 201810852774A CN 109063115 A CN109063115 A CN 109063115A
Authority
CN
China
Prior art keywords
data
statistical
module
statistics
demand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810852774.5A
Other languages
Chinese (zh)
Inventor
汪海波
程乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaian Vocational College of Information Technology
Original Assignee
Huaian Vocational College of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaian Vocational College of Information Technology filed Critical Huaian Vocational College of Information Technology
Priority to CN201810852774.5A priority Critical patent/CN109063115A/en
Publication of CN109063115A publication Critical patent/CN109063115A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses a kind of Intelligent statistical system and method based on online big data, wherein, a kind of Intelligent statistical system based on online big data, it is characterised in that it includes system management module, data reception module, document management module, data statistics module, data analysis module, data memory module, enquiry module, statistics file and data file;A kind of Intelligent statistical method based on online big data, which is characterized in that be included in line data statistical approach and Intelligent data analysis method;The data statistics module can generate statistic algorithm and obtain statistical result, do not need developer according to statistical demand and modify code, reduce the realization time of statistical demand, reduce costs;The data analysis module uses the thought of classification analysis, facilitates information system management, promotes information system management efficiency, improve the utilization quantity and utilization efficiency of resource, user is helped to make correct judgement, has very high application value.

Description

A kind of Intelligent statistical system and method based on online big data
Technical field
The present invention relates to technical field of data processing more particularly to a kind of Intelligent statistical system based on online big data and Method.
Background technique
Big data refers to the data that can not be captured, managed and be handled with conventional software tool within certain time Set is magnanimity, the Gao Zeng for needing new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability Long rate and diversified information assets.Our times has welcome big data era, with answering for the multiple technologies such as multimedia With, society in the related fields moment all emerge a large amount of data, increase the data intelligence processing under big data background with And the difficulty of analysis.Big data has complexity under normal conditions, but also has the characteristics that quantity is big, distributed, big data Special processing technique is needed, effectively to handle a large amount of tolerance by the data in the time.Suitable for the technology of big data, Including MPP, database, data mining, distributed file system, distributed data base, cloud computing platform, mutually Networking and expansible storage system.
The statistical analysis of big data is analyzed compared to traditional data, has the characteristics that data volume is big, query analysis is complicated, because And need new big data analysis method and theoretical appearance;On the one hand it is found that existing single Intelligent data analysis side Method cannot comprehensively and efficiently be competent at the work of data analysis;The development of the Intellectual Analysis Technology of another aspect big data is also Depend on the support of novel data storage and organizational technology and new efficient calculation method.Data storage and tissue skill The better distributed data store strategy that art should use, and improve the throughput efficiency of data as far as possible, reduce failure rate.
The mass data accumulated in information system, the value very little of initial data only pass through intelligent statistical analysis Method extracts essence therein, can just be changed into information " gold mine ", benefit for the mankind.Existing data statistic analysis systematic comparison It is fixed, it is not careful enough for the statistics of data, orderly, in addition, can not be according to the spy of data itself for different types of data Property carry out reasonable classification processing, the information for causing user fast and accurately not obtain them and want, thus also not Them can be helped to make correct decision within the limited time.
Summary of the invention
It is existing to solve the purpose of the present invention is to provide a kind of Intelligent statistical system and method based on online big data The problems in technology.
To achieve the above object, the first purpose of this invention is to provide a kind of Intelligent statistical based on online big data System, which is characterized in that including system management module, data reception module, document management module, data statistics module, data Analysis module, data memory module, enquiry module, statistics file and data file, in which:
The system management module for being started to other each modules, being stopped, management and running and monitoring running state;
The source data and its data format are passed to the file for receiving source data by the data acquisition module Management module;
The document management module according to the statistical time granularity of setting for that will receive after receiving the source data The source data according to statistics file control table grouping be stored in each statistics file;Meanwhile when setting a time-out Between, after storing a data, if again without this after a statistical time granularity is plus the time-out time of setting The data of time granularity arrive, then statistics file described in each group are sent to the data statistics module.
The data statistics module generates statistics knot for counting each statistics file according to statistical rules Fruit obtains the first data;
The different intelligence of first data application of the data analysis module for that will be obtained by the data statistics module Energy data analysing method carries out classification processing, obtains the second data;
The data memory module is for remembering the second data convert obtained by the data analysis module at single Record, and be stored in the data file;
The enquiry module according to their needs inquires the data in the data file for user, is thought The information wanted.
Optimization, the system management module is also used to increase or decrease parallel place according to the load condition of computer CPU The number of the data statistics module of reason, while being also used to alarm to unusual condition and generate log information.
Optimization, the source data includes: electric quotient data, traffic for tourism FIELD Data, financial data, market retail Industry data, medical industry data, information show business data, public policy information data and on-line operation daily record data.
Optimization, the data statistics module includes first acquisition unit, the first generation unit, the first transmission unit, the One receiving unit, in which:
The first acquisition unit is used to read the corresponding system of the statistical demand according to the corresponding identifier of statistical demand Meter rule;According to the statistical rules, the statistical rules specified data is obtained, wherein the specified number of the statistical rules According to the ordering rule for including: tables of data to be counted, static fields, measurement type and statistical result;
First generation unit is used to generate statistic algorithm according to the static fields and measurement type that get;According to The ordering rule of the statistic algorithm, tables of data to be counted and statistical result generates statistical result;
First transmission unit calls statistical demand request for sending, so that system management module is according to the calling Statistical demand request returns to the corresponding identifier of statistical demand;
First receiving unit is for receiving the corresponding identifier of statistical demand.
Optimization, the system management module includes second acquisition unit, the second generation unit, the second transmission unit, the Two receiving units, in which:
The second acquisition unit selects the corresponding data to be counted of statistical demand for requesting according to statistical demand Table, and obtain the ordering rule of static fields, measurement type and statistical result;
Second generation unit is used to be tied according to the tables of data to be counted, static fields, measurement type and statistics The ordering rule of fruit generates the corresponding statistical rules of the statistical demand and the corresponding identifier of the statistical demand, for number System generates statistic algorithm according to the statistical rules according to statistics;
Second transmission unit is specifically used for being requested according to the calling statistical demand, sends the corresponding mark of statistical demand Know symbol;
Second receiving unit calls the statistical demand request for receiving.
Optimization, the data memory module includes: that DEU data encryption unit, analysis track storage unit and doubtful point storage are single Member, in which:
The DEU data encryption unit, for data access authority to be arranged;
Analysis track storage unit, is identified and is stored for the analysis track to data;
The doubtful point storage unit, for updating there are the data of doubtful point and being called for the data analysis module, and Occur with automatic early-warning when category information.
Optimization, the palm enquiry module includes: bluetooth server and palm intelligent movable.
Second object of the present invention is to provide a kind of Intelligent statistical method based on online big data, and feature exists In being included in line data statistical approach and Intelligent data analysis method.
Further, online data statistical method comprising steps of
The data statistics module, which is sent, calls statistical demand request, so that the system management module is according to the calling Statistical demand request returns to the corresponding identifier of statistical demand;
The system management module is requested according to statistical demand, obtains the corresponding tables of data to be counted of statistical demand, system Count the ordering rule of field, measurement type and statistical result;
The system management module is according to the tables of data to be counted, static fields, measurement type and statistical result Ordering rule generates the corresponding statistical rules of the statistical demand and the corresponding identifier of the statistical demand, for data system Meter systems generate statistic algorithm according to statistical rules;
The data statistics module receives the corresponding identifier of statistical demand;
The data statistics module obtains the corresponding statistics rule of statistical demand according to the corresponding identifier of the statistical demand Then;
The data statistics module obtains the statistical rules specified data, wherein the specified number of the statistical rules According to the ordering rule for including: tables of data to be counted, static fields, measurement type and statistical result;
The data statistics module generates statistic algorithm according to the static fields and measurement type got;
The data statistics module according to the statistic algorithm, the ordering rule of tables of data to be counted and statistical result, Generate statistical result.
Further, the Intelligent data analysis method uses the thought of classification analysis, and the data analysis module is answered With Intelligent data analysis technology, the data analysing method of different classifications, including decision tree are used for different types of data Method, association rules method, rough set method, Fuzzy Mathematics Analysis method, Artificial Neural Network, chaos and parting are theoretical Method, Natural computation analysis method.
The first is the traditional decision-tree, and the traditional decision-tree is on the known various bases for happening probability On, the desired value that net present value (NPV) is sought by constituting decision tree is more than or equal to zero probability, and assessment item risk judges that its is feasible Property method of decision analysis, be a kind of intuitive graphical method for using probability analysis, it is built upon logarithm on foundations of information theory According to a kind of method classified, the output result of realization is readily appreciated that accuracy is higher, and efficiency is also very fast, but cannot use Complicated data are handled and be analyzed;
Common method has classification and regression tree method, both sides' automatic interaction probe method etc..Wherein classification tree is mainly used for counting Label and classification according to record, regression tree are mainly used for estimating the numerical value of target variable.
Second method is the association rules method, and the association rules method is mainly used in item data library, is closed Join in rule analysis discovery mass data valuable association or correlative connection between item collection, this item data library is usually all wrapped Extremely huge data are included, therefore, are used to cut down search space at present.The common algorithms of correlation rule have Apriori algorithm, base In algorithm, the FP- tree frequency set algorithm etc. of division.
The third method is the rough set method, and the rough set method can carry out subjective assessment to data, as long as By observing data, so that it may the information of redundancy is removed, can preferably support big data,.Its thought is mainly from statistics And machine learning, but be not that both tools are arbitrarily applied, it is based on rough set theory, represented by tables of data Information system is carrier, by analyze data-oriented collection property, rough classification, the certainty of decision rule and coverage because The processes such as son therefrom obtain implicit, potentially useful knowledge.
The rough set method can reach according only to observation data and delete without providing the subjective assessment to knowledge or data Except redundancy, it is very suitable to parallel computation, the direct explanation of result is provided.
Fourth method is the Fuzzy Mathematics Analysis method, the Fuzzy Mathematics Analysis method can to practical problem into The fuzzy analysis of row can obtain more objective effect compared with other analysis methods.In real world objective things it Between usually have certain uncertain.Its accuracy of more complicated system is lower, also means that ambiguity is stronger.In data point During analysis, fuzzy evaluation, fuzzy decision, fuzzy prediction, Fuzzy Pattern Recognition are carried out to practical problem using FUZZY SET APPROACH TO ENVIRONMENTAL And fuzzy cluster analysis, more preferable more objective effect can be obtained in this way.
Fuzzy Analysis deficiency is mainly manifested in: user's driving, and user participates in excessive;It is single to handle variable, cannot locate Qualitative variable and complex data are managed, such as nonlinear data and multi-medium data;It was found that the fact or rule be with inquire be main Purpose, it is little to prediction and Decision Making Effect, and excessively rely on subjective experience.
Fifth method is the Artificial Neural Network, and the Artificial Neural Network has self-learning function, Also has the function of connection entropy on this basis;Artificial neural network is that a kind of application couples similar to cerebral nerve cynapse Structure carries out the mathematical model of information processing, which is constituted by being coupled to each other between a large amount of node (or neuron).Often A kind of a specific output function of node on behalf, referred to as excitation function, the connection between every two node all represent one for logical The weighted value of the connection signal, referred to as weight are crossed, this is equivalent to the memory of artificial neural network, and the output of network is then according to network Connection type, the difference of weighted value and excitation function and it is different, and network itself be usually all to certain algorithm of nature or Person's function approaches, it is also possible to the expression to a kind of logic strategy.
Typical neural network model mainly divides three categories, i.e. feed forward type neural network model, feedback neural network mould Type, Self-organizing Maps method model.Artificial neural network has the characteristics that non-linear, not limited, very qualitative, nonconvex property, There are three aspects for its advantages: first, there is self-learning function;Second, have the function of connection entropy;There is third high speed to seek The ability for looking for optimization to solve;
6th kind of method is the chaos and fractal theory method, and the chaos and fractal theory method are primarily used to pair Phenomenon present in natural society explains, and is generally used to carry out intelligent cognition research, moreover it is possible to be applied to the crowds such as automatic control In multi-field.
Chaos and fractal theory are two key concepts in nonlinear science, study the certainty inside nonlinear system Relationship between randomness.Chaos describe a kind of unstable and track that nonlinear dynamic system has be confined to it is limited Region but never duplicate movement, what point shape was explained is that those surfaces seem disorderly and unsystematic, changeable and substantial potential There is the object of certain inherent law, therefore, the two can be used to explain many general present in nature and social science All over phenomenon.It is many that its theoretical method can be used as intelligent cognition research, graph and image processing, automatic control and economic management etc. The basis of field application.
7th kind of method is the Natural computation analysis method, and the Natural computation analysis method is according to different biological levels Simulation and emulation, can be generally divided into following three kinds of different types of analysis methods: first is that Swarm Intelligence Algorithm, second is that immune Algorithmic approach, third is that DNA algorithm.Swarm intelligence mainly studies collective behavior, and immune algorithm has diversity, classical Mainly have reversed, Immune Clone Selection etc., DNA algorithm principally falls into randomization searching method, it can carry out global optimizing, in reality The search space of optimization can generally be obtained in the utilization on border, on this basis can also the adjust automatically direction of search, in entire mistake Determining rule is not all needed in journey, current dna algorithm is widely used in a variety of industries, and achieves good effect.
Natural computation analysis method Natural computation refers to the inspiration by organism in nature, and simulation, which is realized, to be occurred In nature, the dynamic process easily explained as calculating process.For the simulation and emulation in different generating layer faces, there is swarm intelligence Algorithm, immune algorithm, DNA algorithm etc..
Swarm intelligence is that a kind of natural imitation circle animal pests are looked for food the emerging evolutionary computing of nest building behavior, research By the collective behavior for the decentralized system that several simple individuals form, each individual has phase interaction with other individuals and environment With.Current main SI algorithm has particle swarm optimization algorithm, ant group algorithm, Cultural Algorithm, artificial fish-swarm algorithm and calculation of looking for food Method, classical immune algorithm have Negative selection, Immune Clone Selection, immunological network, danger theory etc..
Genetic algorithm is that a kind of evolution laws (survival of the fittest, genetic mechanism of selecting the superior and eliminating the inferior) for using for reference living nature develop Randomization searching method, be mainly characterized by directly operating structure objects, there is no derivation and function continuities It limits;With inherent Implicit Parallelism and better global optimizing ability;Using the optimization method of randomization, can obtain automatically and The search space for instructing optimization, is adaptively adjusted the direction of search, does not need determining rule.
The beneficial effects of the present invention are: Intelligent statistical system and method one side provided by the invention based on online big data Face obtains the corresponding statistical rules of statistical demand by the data statistics module, according to statistical rules specified data, generates Statistic algorithm and statistical result, the data statistics module can generate statistic algorithm and obtain statistical result, not need by opening Hair personnel modify code according to statistical demand, reduce the realization time of statistical demand, and reduce cost needed for statistical demand; On the other hand the data analysis module uses the thought of classification analysis, for the different intelligence of different types of data application Analysis method, avoids the treatment effeciency difference that can be generated when handling different data with same process and acquisition of information is inaccurate The problem of, the effect for facilitating information system management, promoting information system management efficiency can greatly improve the resource on statistics platform Using quantity and utilization efficiency, user is helped to make correctly judgement and prediction, and shared, the automatic portion of resource may be implemented Administration and dynamic adjust, and have very high application value.
Detailed description of the invention
The following further describes the present invention with reference to the drawings.
Fig. 1 is a kind of structural schematic diagram of the Intelligent statistical system based on online big data of the present invention;
Fig. 2 is the structural representation of data statistics module in a kind of Intelligent statistical system based on online big data of the present invention Figure;
Fig. 3 is the structural representation of data analysis module in a kind of Intelligent statistical system based on online big data of the present invention Figure;
Fig. 4 is the structural representation of system management module in a kind of Intelligent statistical system based on online big data of the present invention Figure;
Fig. 5 is the structural representation of data memory module in a kind of Intelligent statistical system based on online big data of the present invention Figure;
Fig. 6 is the structural schematic diagram of enquiry module in a kind of Intelligent statistical system based on online big data of the present invention;
Fig. 7 is the data statistics flow chart in a kind of Intelligent statistical System and method for based on online big data of the present invention;
Fig. 8 is that the system management module in a kind of Intelligent statistical System and method for based on online big data of the present invention receives Process flow diagram after request;
Fig. 9 is that the statistical result in a kind of Intelligent statistical System and method for based on online big data of the present invention generates specifically Flow chart.
In figure: 1- source data, 2- data acquisition module, 3- document management module, 4- data statistics module, the analysis of 5- data Module, 6- data memory module, 7- enquiry module, 8- system management module, 9- statistics file, 10- data file, 401- first Acquiring unit, the first generation unit of 402-, the first transmission unit of 403-, the first receiving unit of 404-, 501- traditional decision-tree, 502- association rules method, 503- rough set method, 504- Fuzzy Mathematics Analysis method, 505- Artificial Neural Network, 506- chaos and parting theoretical method, 507- Natural computation analysis method, 801- second acquisition unit, 802- second generate list Member, the second transmission unit of 803-, the first receiving unit of 804-, 601- DEU data encryption unit, 602- analysis track storage unit, 603- doubtful point storage unit, 701- bluetooth server, 702- palm mobile device.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Please refer to Fig. 1~6, in the embodiment of the present invention, first is designed to provide a kind of intelligence based on online big data Energy statistical system, which is characterized in that including system management module 8, data acquisition module 2, document management module 3, data statistics Module 4, data analysis module 5, data memory module 6, enquiry module 7, statistics file 9 and data file 10, in which:
The system management module 8 is for starting other each modules, being stopped, management and running and operating status are supervised It surveys;
The data acquire 2 modules for receiving source data 1, and the source data 1 and its data format are passed to described Document management module 2;
The document management module 2 according to the statistical time granularity of setting for that will receive after receiving the source data 1 To the source data 1 according to statistics file control table grouping be stored in each statistics file 9;Meanwhile setting one super When the time, after storing a data, if a statistical time granularity plus setting time-out time after still do not have There are the data of the time granularity to arrive, then statistics file 9 described in each group is sent to the data statistics module 4.
The data statistics module 4 generates statistics knot for counting each statistics file 9 according to statistical rules Fruit obtains the first data;
First data application of the data analysis module 5 for that will obtain by the data statistics module 4 is different Intelligent data analysis method carries out classification processing, obtains the second data;
Second data convert of the data memory module 6 for that will obtain by the data analysis module 5 is at single Record, and be stored in the data file 10;
The enquiry module 7 according to their needs inquires the data in the data file 10 for user, obtains The information that must be wanted.
The system management module 8 is also used to increase or decrease the institute of parallel processing according to the load condition of computer CPU The number of data statistics module 4 is stated, while being also used to alarm to unusual condition and generate log information.
The source data 1 includes: electric quotient data, traffic for tourism FIELD Data, financial data, market retail trade number According to, medical industry data, information show business data, public policy information data and on-line operation daily record data.
The data statistics module 4 include first acquisition unit 401, the first generation unit 402, the first transmission unit 403, First receiving unit 404, in which:
The first acquisition unit 401 is used to that it is corresponding to read the statistical demand according to the corresponding identifier of statistical demand Statistical rules;According to the statistical rules, the statistical rules specified data is obtained, wherein the statistical rules is specified Data include: tables of data to be counted, static fields, measurement type and statistical result ordering rule;
First generation unit 402 is used to generate statistic algorithm according to the static fields and measurement type that get;Root According to the statistic algorithm, the ordering rule of tables of data to be counted and statistical result, statistical result is generated;
First transmission unit 403 calls statistical demand request for sending, so that system management module is according to Statistical demand request is called, the corresponding identifier of statistical demand is returned;
First receiving unit 404 is for receiving the corresponding identifier of statistical demand.
The system management module 8 include second acquisition unit 801, the second generation unit 802, the second transmission unit 803, Second receiving unit 804, in which:
The second acquisition unit 801 selects the corresponding number to be counted of statistical demand for requesting according to statistical demand According to table, and obtain the ordering rule of static fields, measurement type and statistical result;
Second generation unit 802 is used for according to the tables of data to be counted, static fields, measurement type and statistics As a result ordering rule generates the corresponding statistical rules of the statistical demand and the corresponding identifier of the statistical demand, for Data statistics system generates statistic algorithm according to the statistical rules;
Second transmission unit 803 is specifically used for being requested according to the calling statistical demand, and it is corresponding to send statistical demand Identifier;
Second receiving unit 804 calls the statistical demand request for receiving.
The data memory module 6 includes: that DEU data encryption unit 601, analysis track storage unit 602 and doubtful point storage are single Member 603, in which:
The DEU data encryption unit 601, for data access authority to be arranged;
Analysis track storage unit 602, is identified and is stored for the analysis track to data;
The doubtful point storage unit 603, for updating there are the data of doubtful point and being called for the data analysis module, and The automatic early-warning when occurring with category information.
The palm enquiry module 7 includes: bluetooth server 701 and palm intelligent movable 702.
Please refer to Fig. 7~9, in the embodiment of the present invention, second is designed to provide a kind of intelligence based on online big data Energy statistical method, which is characterized in that be included in line data statistical approach and Intelligent data analysis method.
Wherein, online data statistical method comprising steps of
The data statistics module 4, which is sent, calls statistical demand request, so that the system management module 8 is according to the tune It is requested with statistical demand, returns to the corresponding identifier of statistical demand;
The system management module 8 is requested according to statistical demand, obtains the corresponding tables of data to be counted of statistical demand, system Count the ordering rule of field, measurement type and statistical result;
The system management module 8 is according to the tables of data to be counted, static fields, measurement type and statistical result Ordering rule generates the corresponding statistical rules of the statistical demand and the corresponding identifier of the statistical demand, for data system Meter systems generate statistic algorithm according to statistical rules;
The data statistics module 4 receives the corresponding identifier of statistical demand;
The data statistics module 4 obtains the corresponding statistics of statistical demand according to the corresponding identifier of the statistical demand Rule;
The data statistics module 4 obtains the statistical rules specified data, wherein the specified number of the statistical rules According to the ordering rule for including: tables of data to be counted, static fields, measurement type and statistical result;
The data statistics module 4 generates statistic algorithm according to the static fields and measurement type got;
The data statistics module 4 according to the statistic algorithm, the ordering rule of tables of data to be counted and statistical result, Generate statistical result.
The Intelligent data analysis method uses the thought of classification analysis, and the data analysis module 5 applies intelligence Data analysis technique, for different types of data use different classifications data analysing method, including traditional decision-tree 501, Association rules method 502, rough set method 503, Fuzzy Mathematics Analysis method 504, Artificial Neural Network 505, chaos and Parting theoretical method 506, Natural computation analysis method 507.
The first is the traditional decision-tree 507, and the traditional decision-tree 507 is various to happen probability known On the basis of, the desired value that net present value (NPV) is sought by constituting decision tree is more than or equal to zero probability, and assessment item risk judges it The method of decision analysis of feasibility is a kind of intuitive graphical method for using probability analysis, it is built upon on foundations of information theory Accuracy is higher to be readily appreciated that the output result of a kind of method that data are classified, realization, efficiency is also very fast, but not Can be used to that complicated data are handled and analyzed;
Common method has classification and regression tree method, both sides' automatic interaction probe method etc..Wherein classification tree is mainly used for counting Label and classification according to record, regression tree are mainly used for estimating the numerical value of target variable.
Second method is the association rules method 502, and the association rules method 502 is mainly used for item data library In, Association Rule Analysis finds that valuable association or correlative connection, this item data library are logical between item collection in mass data Often all include extremely huge data, therefore, is used to cut down search space at present.The common algorithms of correlation rule have Apriori Algorithm, the algorithm based on division, FP- tree frequency set algorithm etc..
The third method is the rough set method 503, and the rough set method 503 can carry out subjectivity to data and comment Valence, as long as passing through observation data, so that it may remove the information of redundancy, can preferably support big data.Its thought mainly from Statistics and machine learning, but be not that both tools are arbitrarily applied, it is based on rough set theory, with tables of data institute The information system of expression is carrier, by analyze the property of data-oriented collection, rough classification, decision rule certainty and cover The processes such as the cover degree factor therefrom obtain implicit, potentially useful knowledge.
The rough set method can reach according only to observation data and delete without providing the subjective assessment to knowledge or data Except redundancy, it is very suitable to parallel computation, the direct explanation of result is provided.
Fourth method is the Fuzzy Mathematics Analysis method 504, and the Fuzzy Mathematics Analysis method 504 can be to reality Problem carries out fuzzy analysis, compared with other analysis methods, can obtain more objective effect.It is objective in real world Usually have certain uncertain between things.Its accuracy of more complicated system is lower, also means that ambiguity is stronger.? In data analysis process, fuzzy evaluation, fuzzy decision, fuzzy prediction, fuzzy mould are carried out to practical problem using FUZZY SET APPROACH TO ENVIRONMENTAL Formula identification and fuzzy cluster analysis can obtain more preferable more objective effect in this way.
Fuzzy Analysis deficiency is mainly manifested in: user's driving, and user participates in excessive;It is single to handle variable, cannot locate Qualitative variable and complex data are managed, such as nonlinear data and multi-medium data;It was found that the fact or rule be with inquire be main Purpose, it is little to prediction and Decision Making Effect, and excessively rely on subjective experience.
Fifth method is the Artificial Neural Network 505, and the Artificial Neural Network 505 has self study Function also has the function of connection entropy on this basis;Artificial neural network is that a kind of application is similar to cerebral nerve cynapse The structure of connection carries out the mathematical model of information processing, and the model is by being coupled to each other structure between a large amount of node (or neuron) At.A kind of each specific output function of node on behalf, referred to as excitation function, the connection between every two node all represent one it is right In by the weighted value of the connection signal, referred to as weight, this is equivalent to the memory of artificial neural network, the output of network then according to The difference of the connection type of network, weighted value and excitation function and it is different, and network itself be usually all to nature certain calculation Method or function approach, it is also possible to the expression to a kind of logic strategy.
Typical neural network model mainly divides three categories, i.e. feed forward type neural network model, feedback neural network mould Type, Self-organizing Maps method model.Artificial neural network has the characteristics that non-linear, not limited, very qualitative, nonconvex property, There are three aspects for its advantages: first, there is self-learning function;Second, have the function of connection entropy;There is third high speed to seek The ability for looking for optimization to solve;
6th kind of method is 506 chaos and fractal theory method, and the chaos and fractal theory method 506 are mainly It for explaining phenomenon present in natural society, is generally used to carry out intelligent cognition research, moreover it is possible to be applied to automatic control In the various fields such as system.
Chaos and fractal theory are two key concepts in nonlinear science, study the certainty inside nonlinear system Relationship between randomness.Chaos describe a kind of unstable and track that nonlinear dynamic system has be confined to it is limited Region but never duplicate movement, what point shape was explained is that those surfaces seem disorderly and unsystematic, changeable and substantial potential There is the object of certain inherent law, therefore, the two can be used to explain many general present in nature and social science All over phenomenon.It is many that its theoretical method can be used as intelligent cognition research, graph and image processing, automatic control and economic management etc. The basis of field application.
7th kind of method is the Natural computation analysis method 507, and the Natural computation analysis method 507 is according to different lifes The simulation and emulation in nitride layer face can be generally divided into following three kinds of different types of analysis methods: first is that Swarm Intelligence Algorithm, two It is immune algorithmic approach, third is that DNA algorithm.Swarm intelligence mainly studies collective behavior, and immune algorithm has multiplicity Property, classical mainly has reversed, Immune Clone Selection etc., and DNA algorithm principally falls into randomization searching method, it can carry out the overall situation and seek It is excellent, the search space of optimization can be generally obtained in actual utilization, on this basis can also the adjust automatically direction of search, Determining rule is not all needed in whole process, current dna algorithm is widely used in a variety of industries, and achieve it is good at Effect.
Natural computation analysis method Natural computation refers to the inspiration by organism in nature, and simulation, which is realized, to be occurred In nature, the dynamic process easily explained as calculating process.For the simulation and emulation in different generating layer faces, there is swarm intelligence Algorithm, immune algorithm, DNA algorithm etc..
Swarm intelligence is that a kind of natural imitation circle animal pests are looked for food the emerging evolutionary computing of nest building behavior, research By the collective behavior for the decentralized system that several simple individuals form, each individual has phase interaction with other individuals and environment With.Current main SI algorithm has particle swarm optimization algorithm, ant group algorithm, Cultural Algorithm, artificial fish-swarm algorithm and calculation of looking for food Method, classical immune algorithm have Negative selection, Immune Clone Selection, immunological network, danger theory etc..
Genetic algorithm is that a kind of evolution laws (survival of the fittest, genetic mechanism of selecting the superior and eliminating the inferior) for using for reference living nature develop Randomization searching method, be mainly characterized by directly operating structure objects, there is no derivation and function continuities It limits;With inherent Implicit Parallelism and better global optimizing ability;Using the optimization method of randomization, can obtain automatically and The search space for instructing optimization, is adaptively adjusted the direction of search, does not need determining rule.
The working principle of the invention is:
On the one hand Intelligent statistical system and method provided by the invention based on online big data passes through the data statistics Module obtains the corresponding statistical rules of statistical demand, according to statistical rules specified data, generates statistic algorithm and statistical result, Data statistics module can generate statistic algorithm and obtain statistical result, not need to modify generation according to statistical demand by developer Code, reduces the realization time of statistical demand, and reduce cost needed for statistical demand;On the other hand the data analysis module It is avoided for the different intelligent analysis method of different types of data application with same using the thought of classification analysis Method handles the problem for the treatment of effeciency difference and acquisition of information inaccuracy that can be generated when different data, facilitate information system management, The effect for promoting information system management efficiency can greatly improve the utilization quantity and utilization efficiency of the resource on statistics platform, side It helps user to make correctly judgement and prediction, and the shared of resource, automatic deployment and dynamic may be implemented and adjust, have very high Application value.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.

Claims (10)

1. a kind of Intelligent statistical system based on online big data, which is characterized in that adopted including system management module (8), data Collection module (2), data statistics module (4), data analysis module (5), data memory module (6), is looked into document management module (3) Ask module (7), statistics file (9) and data file (10), in which:
The system management module (8) for being started to other each modules, being stopped, management and running and monitoring running state;
The source data (1) and its data format are passed to institute for receiving source data (1) by data acquisition (2) module State document management module (2);
The document management module (2) according to the statistical time granularity of setting for that will receive after receiving the source data (1) To the source data (1) according to statistics file control table grouping be stored in each statistics file (9);Meanwhile setting one A time-out time, after storing a data, if after a statistical time granularity is plus the time-out time of setting also It is the data arrival of the not time granularity, then statistics file described in each group (9) is sent to the data statistics module (4).
The data statistics module (4) generates statistics knot for counting each statistics file (9) according to statistical rules Fruit obtains the first data;
First data application of the data analysis module (5) for that will obtain by the data statistics module (4) is different Intelligent data analysis method carries out classification processing, obtains the second data;
Second data convert of the data memory module (6) for that will obtain by the data analysis module (5) is at single Record, and be stored in the data file (10);
The enquiry module (7) according to their needs inquires the data in the data file (10) for user, obtains The information that must be wanted.
2. a kind of Intelligent statistical system based on online big data according to claim 1, which is characterized in that the system Management module (8) is also used to increase or decrease the data statistics module of parallel processing according to the load condition of computer CPU (4) number, while being also used to alarm to unusual condition and generate log information.
3. a kind of Intelligent statistical system based on online big data according to claim 2, which is characterized in that the source number It include: electric quotient data, traffic for tourism FIELD Data, financial data, market retail trade data, medical industry number according to (1) According to, information show business data, public policy information data and on-line operation daily record data.
4. a kind of Intelligent statistical system based on online big data according to claim 3, which is characterized in that the data Statistical module (4) includes first acquisition unit (401), the first generation unit (402), the first transmission unit (403), the first reception Unit (404), in which:
The first acquisition unit (401) is used to that it is corresponding to read the statistical demand according to the corresponding identifier of statistical demand Statistical rules;According to the statistical rules, the statistical rules specified data is obtained, wherein what the statistical rules was specified Data include: the ordering rule of tables of data to be counted, static fields, measurement type and statistical result;
First generation unit (402) is used to generate statistic algorithm according to the static fields and measurement type that get;According to The ordering rule of the statistic algorithm, tables of data to be counted and statistical result generates statistical result;
First transmission unit (403) calls statistical demand request for sending, so that system management module is according to the tune It is requested with statistical demand, returns to the corresponding identifier of statistical demand;
First receiving unit (404) is for receiving the corresponding identifier of statistical demand.
5. a kind of Intelligent statistical system based on online big data according to claim 4, which is characterized in that the system Management module (8) includes second acquisition unit (801), the second generation unit (802), the second transmission unit (803), the second reception Unit (804), in which:
The second acquisition unit (801) selects the corresponding data to be counted of statistical demand for requesting according to statistical demand Table, and obtain the ordering rule of static fields, measurement type and statistical result;
Second generation unit (802) is used to be tied according to the tables of data to be counted, static fields, measurement type and statistics The ordering rule of fruit generates the corresponding statistical rules of the statistical demand and the corresponding identifier of the statistical demand, for number System generates statistic algorithm according to the statistical rules according to statistics;
Second transmission unit (803) is specifically used for being requested according to the calling statistical demand, and it is corresponding to send statistical demand Identifier;
Second receiving unit (804) calls the statistical demand request for receiving.
6. a kind of Intelligent statistical system based on online big data according to claim 5, which is characterized in that the data Memory module (6) includes: DEU data encryption unit (601), analyzes track storage unit (602) and doubtful point storage unit (603), In:
The DEU data encryption unit (601), for data access authority to be arranged;
The analysis track storage unit (602), is identified and is stored for the analysis track to data;
The doubtful point storage unit (603), for updating there are the data of doubtful point and being called for the data analysis module, and Occur with automatic early-warning when category information.
7. a kind of Intelligent statistical system based on online big data according to claim 6, which is characterized in that the palm Enquiry module (7) includes: bluetooth server (701) and palm intelligent movable (702).
8. a kind of Intelligent statistical method based on online big data, which is characterized in that be included in line data statistical approach and intelligence Data analysing method.
9. a kind of Intelligent statistical method based on online big data according to claim 8, which is characterized in that online data Statistical method comprising steps of
The data statistics module (4), which sends, calls statistical demand request, so that the system management module (8) is according to the tune It is requested with statistical demand, returns to the corresponding identifier of statistical demand;
The system management module (8) is requested according to statistical demand, obtains the corresponding tables of data to be counted of statistical demand, statistics The ordering rule of field, measurement type and statistical result;
The system management module (8) is according to the row of the tables of data to be counted, static fields, measurement type and statistical result Sequence rule, generates the corresponding statistical rules of the statistical demand and the corresponding identifier of the statistical demand, for data statistics System generates statistic algorithm according to statistical rules;
The data statistics module (4) receives the corresponding identifier of statistical demand;
The data statistics module (4) obtains the corresponding statistics rule of statistical demand according to the corresponding identifier of the statistical demand Then;
The data statistics module (4) obtains the statistical rules specified data, wherein the statistical rules specified data It include: the ordering rule of tables of data to be counted, static fields, measurement type and statistical result;
The data statistics module (4) generates statistic algorithm according to the static fields and measurement type that get;
The data statistics module (4) is raw according to the statistic algorithm, the ordering rule of tables of data to be counted and statistical result At statistical result.
10. a kind of Intelligent statistical method based on online big data according to claim 9, which is characterized in that the intelligence Energy data analysing method uses the thought of classification analysis, and the data analysis module (5) applies Intelligent data analysis technology, The data analysing method of different classifications, including traditional decision-tree (501), association rules method are used for different types of data (502), rough set method (503), Fuzzy Mathematics Analysis method (504), Artificial Neural Network (505), chaos and parting Theoretical method (506), Natural computation analysis method (507).
CN201810852774.5A 2018-07-30 2018-07-30 A kind of Intelligent statistical system and method based on online big data Pending CN109063115A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810852774.5A CN109063115A (en) 2018-07-30 2018-07-30 A kind of Intelligent statistical system and method based on online big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810852774.5A CN109063115A (en) 2018-07-30 2018-07-30 A kind of Intelligent statistical system and method based on online big data

Publications (1)

Publication Number Publication Date
CN109063115A true CN109063115A (en) 2018-12-21

Family

ID=64831833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810852774.5A Pending CN109063115A (en) 2018-07-30 2018-07-30 A kind of Intelligent statistical system and method based on online big data

Country Status (1)

Country Link
CN (1) CN109063115A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819019A (en) * 2018-12-29 2019-05-28 中国科学院计算技术研究所 Monitoring and statistical analysis technique and system for the acquisition of large scale network data
CN111400562A (en) * 2020-03-26 2020-07-10 中扭科技(重庆)有限公司 Online data processing system and method for fastener bolt torque setting
CN111625030A (en) * 2020-05-19 2020-09-04 北京工业职业技术学院 Greenhouse environment control method, device, equipment, system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345527A (en) * 2013-07-23 2013-10-09 深圳市博瑞得科技有限公司 Intelligent data statistical system
CN104820716A (en) * 2015-05-21 2015-08-05 中国人民解放军海军工程大学 Equipment reliability evaluation method based on data mining
CN105335814A (en) * 2015-09-25 2016-02-17 湖南中德安普大数据网络科技有限公司 Online big data intelligent cloud auditing method and system
CN105976109A (en) * 2016-05-05 2016-09-28 云神科技投资股份有限公司 Intelligent auditing method and system based on big data
CN105975500A (en) * 2016-04-27 2016-09-28 努比亚技术有限公司 Data processing method, data statistical system and backstage management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345527A (en) * 2013-07-23 2013-10-09 深圳市博瑞得科技有限公司 Intelligent data statistical system
CN104820716A (en) * 2015-05-21 2015-08-05 中国人民解放军海军工程大学 Equipment reliability evaluation method based on data mining
CN105335814A (en) * 2015-09-25 2016-02-17 湖南中德安普大数据网络科技有限公司 Online big data intelligent cloud auditing method and system
CN105975500A (en) * 2016-04-27 2016-09-28 努比亚技术有限公司 Data processing method, data statistical system and backstage management system
CN105976109A (en) * 2016-05-05 2016-09-28 云神科技投资股份有限公司 Intelligent auditing method and system based on big data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819019A (en) * 2018-12-29 2019-05-28 中国科学院计算技术研究所 Monitoring and statistical analysis technique and system for the acquisition of large scale network data
CN109819019B (en) * 2018-12-29 2021-04-27 中国科学院计算技术研究所 Monitoring and statistical analysis method and system for large-scale network data acquisition
CN111400562A (en) * 2020-03-26 2020-07-10 中扭科技(重庆)有限公司 Online data processing system and method for fastener bolt torque setting
CN111625030A (en) * 2020-05-19 2020-09-04 北京工业职业技术学院 Greenhouse environment control method, device, equipment, system and storage medium

Similar Documents

Publication Publication Date Title
Grando et al. Machine learning in network centrality measures: Tutorial and outlook
Martens et al. Editorial survey: swarm intelligence for data mining
CN106779087A (en) A kind of general-purpose machinery learning data analysis platform
Wolfrath et al. Haccs: Heterogeneity-aware clustered client selection for accelerated federated learning
CN109063115A (en) A kind of Intelligent statistical system and method based on online big data
CN107046557A (en) The intelligent medical calling inquiry system that dynamic Skyline is inquired about under mobile cloud computing environment
CN105184326A (en) Active learning multi-label social network data analysis method based on graph data
Marcus A comprehensive review of artificial bee colony algorithm
Talingdan Performance comparison of different classification algorithms for household poverty classification
Perifanis et al. Federated learning for 5G base station traffic forecasting
Jiang et al. Novel QoS optimization paradigm for IoT systems with fuzzy logic and visual information mining integration
Sasi Kumar et al. DeepQ Based Heterogeneous Clustering Hybrid Cloud Prediction Using K-Means Algorithm
Orlandi et al. Entropy to mitigate non-IID data problem on federated learning for the edge intelligence environment
CN106503271A (en) The intelligent shop site selection system of subspace Skyline inquiry under mobile Internet and cloud computing environment
Mele A review of machine learning algorithms used for load forecasting at microgrid level
Dridi et al. An artificial intelligence approach for time series next generation applications
Pareek et al. A review report on knowledge discovery in databases and various techniques of data mining
Piri et al. Quantitative association rule mining using multi-objective particle swarm optimization
Wang et al. A novel visual analytics approach for clustering large-scale social data
Tryhuba et al. System Model of Formation of the Value of Projects of Digital Transformation in Rural Communities
Grando et al. Computing vertex centrality measures in massive real networks with a neural learning model
Ponni et al. Multi-agent system for data classification from data mining using SVM
Shafia et al. A hybrid algorithm for data clustering using honey bee algorithm, genetic algorithm and k-means method
Sassite et al. A machine learning and multi-agent model to automate big data analytics in smart cities
CN106599188A (en) Smart store location method employing sub-space Skyline query under mobile internet and cloud computing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221

RJ01 Rejection of invention patent application after publication